diff mbox

x86: percpu_to_op() misses memory and flags clobbers

Message ID 49D32212.80607@cosmosbay.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet April 1, 2009, 8:13 a.m. UTC
While playing with new percpu_{read|write|add|sub} stuff in network tree,
I found x86 asm was a litle bit optimistic.

We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
memory and possibly eflags. We could add another parameter to percpu_to_op()
to separate the plain "mov" case (not changing eflags),
but let keep it simple for the moment.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jeremy Fitzhardinge April 1, 2009, 9:02 a.m. UTC | #1
Eric Dumazet wrote:
> While playing with new percpu_{read|write|add|sub} stuff in network tree,
> I found x86 asm was a litle bit optimistic.
>
> We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
> memory and possibly eflags. We could add another parameter to percpu_to_op()
> to separate the plain "mov" case (not changing eflags),
> but let keep it simple for the moment.
>   

Did you observe an actual failure that this patch fixed?

> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index aee103b..fd4f8ec 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -82,22 +82,26 @@ do {							\
>  	case 1:						\
>  		asm(op "b %1,"__percpu_arg(0)		\
>  		    : "+m" (var)			\
> -		    : "ri" ((T__)val));			\
> +		    : "ri" ((T__)val)			\
> +		    : "memory", "cc");			\
>   

This shouldn't be necessary.   The "+m" already tells gcc that var is a 
memory input and output, and there are no other memory side-effects 
which it needs to be aware of; clobbering "memory" will force gcc to 
reload all register-cached memory, which is a pretty hard hit.  I think 
all asms implicitly clobber "cc", so that shouldn't have any effect, but 
it does no harm.

Now, its true that the asm isn't actually modifying var itself, but 
%gs:var, which is a different location.  But from gcc's perspective that 
shouldn't matter because var makes a perfectly good proxy for that 
location, and will make sure it correctly order all accesses to var.

I'd be surprised if this were broken, because we'd be seeing all sorts 
of strange crashes all over the place.  We've seen it before when the 
old x86-64 pda code didn't have proper constraints on its asm statements.

    J
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 1, 2009, 10:14 a.m. UTC | #2
Jeremy Fitzhardinge a écrit :
> Eric Dumazet wrote:
>> While playing with new percpu_{read|write|add|sub} stuff in network tree,
>> I found x86 asm was a litle bit optimistic.
>>
>> We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
>> memory and possibly eflags. We could add another parameter to
>> percpu_to_op()
>> to separate the plain "mov" case (not changing eflags),
>> but let keep it simple for the moment.
>>   
> 
> Did you observe an actual failure that this patch fixed?
> 

Not in current tree, as we dont use yet percpu_xxxx() very much.

If deployed for SNMP mibs with hundred of call sites,
can you guarantee it will work as is ?

>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>>
>> diff --git a/arch/x86/include/asm/percpu.h
>> b/arch/x86/include/asm/percpu.h
>> index aee103b..fd4f8ec 100644
>> --- a/arch/x86/include/asm/percpu.h
>> +++ b/arch/x86/include/asm/percpu.h
>> @@ -82,22 +82,26 @@ do {                            \
>>      case 1:                        \
>>          asm(op "b %1,"__percpu_arg(0)        \
>>              : "+m" (var)            \
>> -            : "ri" ((T__)val));            \
>> +            : "ri" ((T__)val)            \
>> +            : "memory", "cc");            \
>>   
> 
> This shouldn't be necessary.   The "+m" already tells gcc that var is a
> memory input and output, and there are no other memory side-effects
> which it needs to be aware of; clobbering "memory" will force gcc to
> reload all register-cached memory, which is a pretty hard hit.  I think
> all asms implicitly clobber "cc", so that shouldn't have any effect, but
> it does no harm.


So, we can probably cleanup many asms in tree :)

static inline void __down_read(struct rw_semaphore *sem)
{
        asm volatile("# beginning down_read\n\t"
                     LOCK_PREFIX "  incl      (%%eax)\n\t"
                     /* adds 0x00000001, returns the old value */
                     "  jns        1f\n"
                     "  call call_rwsem_down_read_failed\n"
                     "1:\n\t"
                     "# ending down_read\n\t"
                     : "+m" (sem->count)
                     : "a" (sem)
                     : "memory", "cc");
}




> 
> Now, its true that the asm isn't actually modifying var itself, but
> %gs:var, which is a different location.  But from gcc's perspective that
> shouldn't matter because var makes a perfectly good proxy for that
> location, and will make sure it correctly order all accesses to var.
> 
> I'd be surprised if this were broken, because we'd be seeing all sorts
> of strange crashes all over the place.  We've seen it before when the
> old x86-64 pda code didn't have proper constraints on its asm statements.

I was not saying it is broken, but a "litle bit optimistic" :)

Better be safe than sorry, because those errors are very hard to track, since
it depends a lot on gcc being aggressive or not. I dont have time to test
all gcc versions all over there.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 1, 2009, 4:12 p.m. UTC | #3
* Eric Dumazet <dada1@cosmosbay.com> wrote:

> Jeremy Fitzhardinge a écrit :
> > Eric Dumazet wrote:
> >> While playing with new percpu_{read|write|add|sub} stuff in network tree,
> >> I found x86 asm was a litle bit optimistic.
> >>
> >> We need to tell gcc that percpu_{write|add|sub|or|xor} are modyfing
> >> memory and possibly eflags. We could add another parameter to
> >> percpu_to_op()
> >> to separate the plain "mov" case (not changing eflags),
> >> but let keep it simple for the moment.
> >>   
> > 
> > Did you observe an actual failure that this patch fixed?
> > 
> 
> Not in current tree, as we dont use yet percpu_xxxx() very much.
> 
> If deployed for SNMP mibs with hundred of call sites,
> can you guarantee it will work as is ?

Do we "guarantee" it for you? No.

Is it expected to work just fine? Yes.

Are there any known bugs in this area? No.

Will we fix it if it's demonstrated to be broken? Of course! :-)

[ Btw., it's definitely cool that you will make heavy use for it for 
  SNMP mib statistics - please share with us your experiences with 
  the facilities - good or bad experiences alike! ]

> >> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> >>
> >> diff --git a/arch/x86/include/asm/percpu.h
> >> b/arch/x86/include/asm/percpu.h
> >> index aee103b..fd4f8ec 100644
> >> --- a/arch/x86/include/asm/percpu.h
> >> +++ b/arch/x86/include/asm/percpu.h
> >> @@ -82,22 +82,26 @@ do {                            \
> >>      case 1:                        \
> >>          asm(op "b %1,"__percpu_arg(0)        \
> >>              : "+m" (var)            \
> >> -            : "ri" ((T__)val));            \
> >> +            : "ri" ((T__)val)            \
> >> +            : "memory", "cc");            \
> >>   
> > 
> > This shouldn't be necessary.   The "+m" already tells gcc that var is a
> > memory input and output, and there are no other memory side-effects
> > which it needs to be aware of; clobbering "memory" will force gcc to
> > reload all register-cached memory, which is a pretty hard hit.  I think
> > all asms implicitly clobber "cc", so that shouldn't have any effect, but
> > it does no harm.
> 
> 
> So, we can probably cleanup many asms in tree :)
> 
> static inline void __down_read(struct rw_semaphore *sem)
> {
>         asm volatile("# beginning down_read\n\t"
>                      LOCK_PREFIX "  incl      (%%eax)\n\t"
>                      /* adds 0x00000001, returns the old value */
>                      "  jns        1f\n"
>                      "  call call_rwsem_down_read_failed\n"
>                      "1:\n\t"
>                      "# ending down_read\n\t"
>                      : "+m" (sem->count)
>                      : "a" (sem)
>                      : "memory", "cc");
> }

Hm, what's your point with pasting this inline function?

> > Now, its true that the asm isn't actually modifying var itself, but
> > %gs:var, which is a different location.  But from gcc's perspective that
> > shouldn't matter because var makes a perfectly good proxy for that
> > location, and will make sure it correctly order all accesses to var.
> > 
> > I'd be surprised if this were broken, because we'd be seeing all sorts
> > of strange crashes all over the place.  We've seen it before when the
> > old x86-64 pda code didn't have proper constraints on its asm statements.
> 
> I was not saying it is broken, but a "litle bit optimistic" :)
> 
> Better be safe than sorry, because those errors are very hard to 
> track, since it depends a lot on gcc being aggressive or not. I 
> dont have time to test all gcc versions all over there.

Well, Jeremy has already made the valid point that your patch 
pessimises the constraints and hence likely causes worse code.

We can only apply assembly constraint patches that:

    either fix a demonstrated bug,

 or improve (speed up) the code emitted,

 or very rarely, we will apply patches that dont actually make the
    code worse (they are an invariant) but are perceived to be safer

This patch matches neither of these tests and in fact it will 
probably make the generated code worse.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeremy Fitzhardinge April 1, 2009, 4:41 p.m. UTC | #4
Ingo Molnar wrote:
>>                      : "memory", "cc");
>> }
>>     
>
> Hm, what's your point with pasting this inline function?
>   

He's pointing out the redundant (but harmless) "cc" clobber.

    J
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 1, 2009, 4:44 p.m. UTC | #5
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>>>                      : "memory", "cc");
>>> }
>>>     
>>
>> Hm, what's your point with pasting this inline function?
>>   
>
> He's pointing out the redundant (but harmless) "cc" clobber.

ah, yes. We are completely inconsistent about that. It doesnt
matter on x86 so i guess could be removed everywhere.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index aee103b..fd4f8ec 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -82,22 +82,26 @@  do {							\
 	case 1:						\
 		asm(op "b %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
-		    : "ri" ((T__)val));			\
+		    : "ri" ((T__)val)			\
+		    : "memory", "cc");			\
 		break;					\
 	case 2:						\
 		asm(op "w %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
-		    : "ri" ((T__)val));			\
+		    : "ri" ((T__)val)			\
+		    : "memory", "cc");			\
 		break;					\
 	case 4:						\
 		asm(op "l %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
-		    : "ri" ((T__)val));			\
+		    : "ri" ((T__)val)			\
+		    : "memory", "cc");			\
 		break;					\
 	case 8:						\
 		asm(op "q %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
-		    : "re" ((T__)val));			\
+		    : "re" ((T__)val)			\
+		    : "memory", "cc");			\
 		break;					\
 	default: __bad_percpu_size();			\
 	}						\