diff mbox series

core/ipmi: Fix use-after-free

Message ID 20191014170441.12536-1-hegdevasant@linux.vnet.ibm.com
State Accepted
Headers show
Series core/ipmi: Fix use-after-free | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (1785745d5a7eaefd7d0c135f2a3b0f5d86aefec5)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Vasant Hegde Oct. 14, 2019, 5:04 p.m. UTC
Commit f01cd77 introduced backend poller() for ipmi message. But in some
corner cases its possible that we endup calling poller() after freeing
ipmi message.

Thread 1 :
  ipmi_queue_msg_sync()
    Waiting for ipmi sync message to complete

Thread 2 :
  bt_poll() -> ipmi_cmd_done() -> callback handler -> free message

Oliver hit this issue during fast-reboot test with skiboot DEBUG build.
In debug build we poision the memory after free. That helped us to catch
this issue.

[  460.295570781,3] ***********************************************
[  460.295773157,3] Fatal MCE at 0000000030035cb4   .ipmi_queue_msg_sync+0x110  MSR 9000000000201002
[  460.295887496,3] CFAR : 0000000030035ce8 MSR  : 9000000000000000
[  460.295956419,3] SRR0 : 0000000030035cb4 SRR1 : 9000000000201002
[  460.296035015,3] HSRR0: 0000000030012624 HSRR1: 9000000002803002
[  460.296102413,3] DSISR: 00000008         DAR  : 99999999999999d1
[  460.296169710,3] LR   : 0000000030035ce4 CTR  : 0000000030002880
[  460.296248482,3] CR   : 28002422         XER  : 20040000
[  460.296336621,3] GPR00: 0000000030035ce4 GPR16: 00000000301d36d8
[  460.296415449,3] GPR01: 0000000031c133d0 GPR17: 00000000300f5cd8
[  460.296482811,3] GPR02: 0000000030142700 GPR18: 0000000030407ff0
[  460.296550265,3] GPR03: 0000000000000100 GPR19: 0000000000000000
[  460.296629041,3] GPR04: 0000000028002424 GPR20: 0000000000000000
[  460.296696369,3] GPR05: 0000000020040000 GPR21: 0000000030121d73
[  460.296820977,3] GPR06: c000001fffffd480 GPR22: 0000000030121dd2
[  460.296888226,3] GPR07: c000001fffffd480 GPR23: 0000000030613400
[  460.296978218,3] GPR08: 0000000000000001 GPR24: 0000000000000001
[  460.297056871,3] GPR09: 9999999999999999 GPR25: 0000000031c13960
[  460.297124647,3] GPR10: 0000000000000000 GPR26: 0000000000000004
[  460.297203811,3] GPR11: 0000000000000000 GPR27: 0000000000000003
[  460.297271250,3] GPR12: 0000000028002424 GPR28: 0000000030613400
[  460.297339026,3] GPR13: 0000000031c10000 GPR29: 0000000030406b50
[  460.297417605,3] GPR14: 00000000300f58f8 GPR30: 0000000030406b40
[  460.297485176,3] GPR15: 00000000300f58d8 GPR31: 00000000309249c8

Reported-by: Oliver O'Halloran <oohall@gmail.com>
Fixes: f01cd77 (ipmi: ensure forward progress on ipmi_queue_msg_sync())
Cc: skiboot-stable@lists.ozlabs.org # v6.3+
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 core/ipmi.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Oliver O'Halloran Oct. 15, 2019, 3:37 a.m. UTC | #1
On Tue, Oct 15, 2019 at 4:04 AM Vasant Hegde
<hegdevasant@linux.vnet.ibm.com> wrote:
>
> Commit f01cd77 introduced backend poller() for ipmi message. But in some
> corner cases its possible that we endup calling poller() after freeing
> ipmi message.
>
> Thread 1 :
>   ipmi_queue_msg_sync()
>     Waiting for ipmi sync message to complete
>
> Thread 2 :
>   bt_poll() -> ipmi_cmd_done() -> callback handler -> free message
>
> Oliver hit this issue during fast-reboot test with skiboot DEBUG build.
> In debug build we poision the memory after free. That helped us to catch
> this issue.
>
> [  460.295570781,3] ***********************************************
> [  460.295773157,3] Fatal MCE at 0000000030035cb4   .ipmi_queue_msg_sync+0x110  MSR 9000000000201002
> [  460.295887496,3] CFAR : 0000000030035ce8 MSR  : 9000000000000000
> [  460.295956419,3] SRR0 : 0000000030035cb4 SRR1 : 9000000000201002
> [  460.296035015,3] HSRR0: 0000000030012624 HSRR1: 9000000002803002
> [  460.296102413,3] DSISR: 00000008         DAR  : 99999999999999d1
> [  460.296169710,3] LR   : 0000000030035ce4 CTR  : 0000000030002880
> [  460.296248482,3] CR   : 28002422         XER  : 20040000
> [  460.296336621,3] GPR00: 0000000030035ce4 GPR16: 00000000301d36d8
> [  460.296415449,3] GPR01: 0000000031c133d0 GPR17: 00000000300f5cd8
> [  460.296482811,3] GPR02: 0000000030142700 GPR18: 0000000030407ff0
> [  460.296550265,3] GPR03: 0000000000000100 GPR19: 0000000000000000
> [  460.296629041,3] GPR04: 0000000028002424 GPR20: 0000000000000000
> [  460.296696369,3] GPR05: 0000000020040000 GPR21: 0000000030121d73
> [  460.296820977,3] GPR06: c000001fffffd480 GPR22: 0000000030121dd2
> [  460.296888226,3] GPR07: c000001fffffd480 GPR23: 0000000030613400
> [  460.296978218,3] GPR08: 0000000000000001 GPR24: 0000000000000001
> [  460.297056871,3] GPR09: 9999999999999999 GPR25: 0000000031c13960
> [  460.297124647,3] GPR10: 0000000000000000 GPR26: 0000000000000004
> [  460.297203811,3] GPR11: 0000000000000000 GPR27: 0000000000000003
> [  460.297271250,3] GPR12: 0000000028002424 GPR28: 0000000030613400
> [  460.297339026,3] GPR13: 0000000031c10000 GPR29: 0000000030406b50
> [  460.297417605,3] GPR14: 00000000300f58f8 GPR30: 0000000030406b40
> [  460.297485176,3] GPR15: 00000000300f58d8 GPR31: 00000000309249c8
>
> Reported-by: Oliver O'Halloran <oohall@gmail.com>
> Fixes: f01cd77 (ipmi: ensure forward progress on ipmi_queue_msg_sync())
> Cc: skiboot-stable@lists.ozlabs.org # v6.3+
> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> ---
>  core/ipmi.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/core/ipmi.c b/core/ipmi.c
> index 566300e13..ab2931bc4 100644
> --- a/core/ipmi.c
> +++ b/core/ipmi.c
> @@ -126,6 +126,8 @@ int ipmi_dequeue_msg(struct ipmi_msg *msg)
>
>  void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
>  {
> +       bool clr_sync_msg = false;
> +
>         msg->cc = cc;
>         if (msg->cmd != cmd) {
>                 prerror("IPMI: Incorrect cmd 0x%02x in response\n", cmd);
> @@ -138,6 +140,10 @@ void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
>         }
>         msg->netfn = netfn;
>
> +       if (msg == sync_msg)
> +               clr_sync_msg = true;
> +
> +
>         if (cc != IPMI_CC_NO_ERROR) {
>                 prlog(PR_DEBUG, "IPMI: Got error response. cmd=0x%x, netfn=0x%x,"
>                       " rc=0x%02x\n", msg->cmd, msg->netfn >> 2, msg->cc);
> @@ -151,12 +157,14 @@ void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
>            completion functions. */
>
>         /* If this is a synchronous message flag that we are done */
> -       if (msg == sync_msg)
> +       if (clr_sync_msg)
>                 sync_msg = NULL;
>  }

Hmm, should have looked at this closer before pushing it through CI
and into master. Oh well.

The clr_sync_msg flag here isn't really necessary. We might have
freed() the memory pointed to be msg, but msg itself is just a local
variable and comparing it with sync_msg is just a pointer comparison
which is perfectly safe. I think we might need a barrier after setting
sync_msg = NULL too to ensure that the update to sync_msg is visible
to other threads. It probably doesn't matter much since the other
threads will just block until they see the update though.

That said, merged to master as
d75e82dbfbb9443efeb3f9a5921ac23605aab469. Thanks for looking into it.
Vasant Hegde Oct. 15, 2019, 5:18 a.m. UTC | #2
On 10/15/19 9:07 AM, Oliver O'Halloran wrote:
> On Tue, Oct 15, 2019 at 4:04 AM Vasant Hegde
> <hegdevasant@linux.vnet.ibm.com> wrote:
>>
>> Commit f01cd77 introduced backend poller() for ipmi message. But in some
>> corner cases its possible that we endup calling poller() after freeing
>> ipmi message.
>>
>> Thread 1 :
>>    ipmi_queue_msg_sync()
>>      Waiting for ipmi sync message to complete
>>
>> Thread 2 :
>>    bt_poll() -> ipmi_cmd_done() -> callback handler -> free message
>>
>> Oliver hit this issue during fast-reboot test with skiboot DEBUG build.
>> In debug build we poision the memory after free. That helped us to catch
>> this issue.
>>
>> [  460.295570781,3] ***********************************************
>> [  460.295773157,3] Fatal MCE at 0000000030035cb4   .ipmi_queue_msg_sync+0x110  MSR 9000000000201002
>> [  460.295887496,3] CFAR : 0000000030035ce8 MSR  : 9000000000000000
>> [  460.295956419,3] SRR0 : 0000000030035cb4 SRR1 : 9000000000201002
>> [  460.296035015,3] HSRR0: 0000000030012624 HSRR1: 9000000002803002
>> [  460.296102413,3] DSISR: 00000008         DAR  : 99999999999999d1
>> [  460.296169710,3] LR   : 0000000030035ce4 CTR  : 0000000030002880
>> [  460.296248482,3] CR   : 28002422         XER  : 20040000
>> [  460.296336621,3] GPR00: 0000000030035ce4 GPR16: 00000000301d36d8
>> [  460.296415449,3] GPR01: 0000000031c133d0 GPR17: 00000000300f5cd8
>> [  460.296482811,3] GPR02: 0000000030142700 GPR18: 0000000030407ff0
>> [  460.296550265,3] GPR03: 0000000000000100 GPR19: 0000000000000000
>> [  460.296629041,3] GPR04: 0000000028002424 GPR20: 0000000000000000
>> [  460.296696369,3] GPR05: 0000000020040000 GPR21: 0000000030121d73
>> [  460.296820977,3] GPR06: c000001fffffd480 GPR22: 0000000030121dd2
>> [  460.296888226,3] GPR07: c000001fffffd480 GPR23: 0000000030613400
>> [  460.296978218,3] GPR08: 0000000000000001 GPR24: 0000000000000001
>> [  460.297056871,3] GPR09: 9999999999999999 GPR25: 0000000031c13960
>> [  460.297124647,3] GPR10: 0000000000000000 GPR26: 0000000000000004
>> [  460.297203811,3] GPR11: 0000000000000000 GPR27: 0000000000000003
>> [  460.297271250,3] GPR12: 0000000028002424 GPR28: 0000000030613400
>> [  460.297339026,3] GPR13: 0000000031c10000 GPR29: 0000000030406b50
>> [  460.297417605,3] GPR14: 00000000300f58f8 GPR30: 0000000030406b40
>> [  460.297485176,3] GPR15: 00000000300f58d8 GPR31: 00000000309249c8
>>
>> Reported-by: Oliver O'Halloran <oohall@gmail.com>
>> Fixes: f01cd77 (ipmi: ensure forward progress on ipmi_queue_msg_sync())
>> Cc: skiboot-stable@lists.ozlabs.org # v6.3+
>> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
>> ---
>>   core/ipmi.c | 14 +++++++++++---
>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/core/ipmi.c b/core/ipmi.c
>> index 566300e13..ab2931bc4 100644
>> --- a/core/ipmi.c
>> +++ b/core/ipmi.c
>> @@ -126,6 +126,8 @@ int ipmi_dequeue_msg(struct ipmi_msg *msg)
>>
>>   void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
>>   {
>> +       bool clr_sync_msg = false;
>> +
>>          msg->cc = cc;
>>          if (msg->cmd != cmd) {
>>                  prerror("IPMI: Incorrect cmd 0x%02x in response\n", cmd);
>> @@ -138,6 +140,10 @@ void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
>>          }
>>          msg->netfn = netfn;
>>
>> +       if (msg == sync_msg)
>> +               clr_sync_msg = true;
>> +
>> +
>>          if (cc != IPMI_CC_NO_ERROR) {
>>                  prlog(PR_DEBUG, "IPMI: Got error response. cmd=0x%x, netfn=0x%x,"
>>                        " rc=0x%02x\n", msg->cmd, msg->netfn >> 2, msg->cc);
>> @@ -151,12 +157,14 @@ void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
>>             completion functions. */
>>
>>          /* If this is a synchronous message flag that we are done */
>> -       if (msg == sync_msg)
>> +       if (clr_sync_msg)
>>                  sync_msg = NULL;
>>   }
> 
> Hmm, should have looked at this closer before pushing it through CI
> and into master. Oh well.
> 
> The clr_sync_msg flag here isn't really necessary. We might have

Oh yeah.. I will send fix patch.

IMO sync ipmi_msg's shouldn't have callback handler itself.. .. as we go back to 
caller as soon as
messages completes. Flow should be :
   ipmi_msg_sync -> wait for response from BMC -> go back to caller -> post 
processing


-Vasant

> freed() the memory pointed to be msg, but msg itself is just a local
> variable and comparing it with sync_msg is just a pointer comparison
> which is perfectly safe. I think we might need a barrier after setting
> sync_msg = NULL too to ensure that the update to sync_msg is visible
> to other threads. It probably doesn't matter much since the other
> threads will just block until they see the update though.
> 
> That said, merged to master as
> d75e82dbfbb9443efeb3f9a5921ac23605aab469. Thanks for looking into it.
>
diff mbox series

Patch

diff --git a/core/ipmi.c b/core/ipmi.c
index 566300e13..ab2931bc4 100644
--- a/core/ipmi.c
+++ b/core/ipmi.c
@@ -126,6 +126,8 @@  int ipmi_dequeue_msg(struct ipmi_msg *msg)
 
 void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
 {
+	bool clr_sync_msg = false;
+
 	msg->cc = cc;
 	if (msg->cmd != cmd) {
 		prerror("IPMI: Incorrect cmd 0x%02x in response\n", cmd);
@@ -138,6 +140,10 @@  void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
 	}
 	msg->netfn = netfn;
 
+	if (msg == sync_msg)
+		clr_sync_msg = true;
+
+
 	if (cc != IPMI_CC_NO_ERROR) {
 		prlog(PR_DEBUG, "IPMI: Got error response. cmd=0x%x, netfn=0x%x,"
 		      " rc=0x%02x\n", msg->cmd, msg->netfn >> 2, msg->cc);
@@ -151,12 +157,14 @@  void ipmi_cmd_done(uint8_t cmd, uint8_t netfn, uint8_t cc, struct ipmi_msg *msg)
 	   completion functions. */
 
 	/* If this is a synchronous message flag that we are done */
-	if (msg == sync_msg)
+	if (clr_sync_msg)
 		sync_msg = NULL;
 }
 
 void ipmi_queue_msg_sync(struct ipmi_msg *msg)
 {
+	void (*poll)(void) = msg->backend->poll;
+
 	if (!ipmi_present())
 		return;
 
@@ -181,8 +189,8 @@  void ipmi_queue_msg_sync(struct ipmi_msg *msg)
 	 * progress.
 	 */
 	while (sync_msg == msg) {
-		if (msg->backend->poll)
-			msg->backend->poll();
+		if (poll)
+			poll();
 		time_wait_ms(10);
 	}
 }