Bluetooth: Ignore CC events not matching the last HCI command
diff mbox series

Message ID 20190424060704.22721-1-jprvita@endlessm.com
State Awaiting Upstream
Delegated to: David Miller
Headers show
Series
  • Bluetooth: Ignore CC events not matching the last HCI command
Related show

Commit Message

=?UTF-8?q?Jo=C3=A3o=20Paulo=20Rechi=20Vita?= April 24, 2019, 6:07 a.m. UTC
This commit makes the kernel not send the next queued HCI command until
a command complete arrives for the last HCI command sent to the
controller. This change avoids a problem with some buggy controllers
(seen on two SKUs of QCA9377) that send an extra command complete event
for the previous command after the kernel had already sent a new HCI
command to the controller.

The problem was reproduced when starting an active scanning procedure,
where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
command. When this happends the kernel ends up not processing the
command complete for the following commmand, LE_SET_SCAN_PARAM, and
ultimately behaving as if a passive scanning procedure was being
performed, when in fact controller is performing an active scanning
procedure. This makes it impossible to discover BLE devices as no device
found events are sent to userspace.

This problem is reproducible on 100% of the attempts on the affected
controllers. The extra command complete event can be seen at timestamp
27.420131 on the btmon logs bellow.

Bluetooth monitor ver 5.50
= Note: Linux version 5.0.0+ (x86_64)                                  0.352340
= Note: Bluetooth subsystem version 2.22                               0.352343
= New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0)               [hci0] 0.352344
= Open Index: 80:C5:F2:8F:87:84                                 [hci0] 0.352345
= Index Info: 80:C5:F2:8F:87:84 (Qualcomm)                      [hci0] 0.352346
@ MGMT Open: bluetoothd (privileged) version 1.14             {0x0001} 0.352347
@ MGMT Open: btmon (privileged) version 1.14                  {0x0002} 0.352366
@ MGMT Open: btmgmt (privileged) version 1.14                {0x0003} 27.302164
@ MGMT Command: Start Discovery (0x0023) plen 1       {0x0003} [hci0] 27.302310
        Address type: 0x06
          LE Public
          LE Random
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6   #1 [hci0] 27.302496
        Address: 15:60:F2:91:B2:24 (Non-Resolvable)
> HCI Event: Command Complete (0x0e) plen 4                 #2 [hci0] 27.419117
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7  #3 [hci0] 27.419244
        Type: Active (0x01)
        Interval: 11.250 msec (0x0012)
        Window: 11.250 msec (0x0012)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement (0x00)
> HCI Event: Command Complete (0x0e) plen 4                 #4 [hci0] 27.420131
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2      #5 [hci0] 27.420259
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                 #6 [hci0] 27.420969
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
> HCI Event: Command Complete (0x0e) plen 4                 #7 [hci0] 27.421983
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
@ MGMT Event: Command Complete (0x0001) plen 4        {0x0003} [hci0] 27.422059
      Start Discovery (0x0023) plen 1
        Status: Success (0x00)
        Address type: 0x06
          LE Public
          LE Random
@ MGMT Event: Discovering (0x0013) plen 2             {0x0003} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2             {0x0002} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2             {0x0001} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
---
 net/bluetooth/hci_event.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Marcel Holtmann April 24, 2019, 6:45 a.m. UTC | #1
Hi Joao Paulo,

> This commit makes the kernel not send the next queued HCI command until
> a command complete arrives for the last HCI command sent to the
> controller. This change avoids a problem with some buggy controllers
> (seen on two SKUs of QCA9377) that send an extra command complete event
> for the previous command after the kernel had already sent a new HCI
> command to the controller.
> 
> The problem was reproduced when starting an active scanning procedure,
> where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
> command. When this happends the kernel ends up not processing the
> command complete for the following commmand, LE_SET_SCAN_PARAM, and
> ultimately behaving as if a passive scanning procedure was being
> performed, when in fact controller is performing an active scanning
> procedure. This makes it impossible to discover BLE devices as no device
> found events are sent to userspace.
> 
> This problem is reproducible on 100% of the attempts on the affected
> controllers. The extra command complete event can be seen at timestamp
> 27.420131 on the btmon logs bellow.
> 
> Bluetooth monitor ver 5.50
> = Note: Linux version 5.0.0+ (x86_64)                                  0.352340
> = Note: Bluetooth subsystem version 2.22                               0.352343
> = New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0)               [hci0] 0.352344
> = Open Index: 80:C5:F2:8F:87:84                                 [hci0] 0.352345
> = Index Info: 80:C5:F2:8F:87:84 (Qualcomm)                      [hci0] 0.352346
> @ MGMT Open: bluetoothd (privileged) version 1.14             {0x0001} 0.352347
> @ MGMT Open: btmon (privileged) version 1.14                  {0x0002} 0.352366
> @ MGMT Open: btmgmt (privileged) version 1.14                {0x0003} 27.302164
> @ MGMT Command: Start Discovery (0x0023) plen 1       {0x0003} [hci0] 27.302310
>        Address type: 0x06
>          LE Public
>          LE Random
> < HCI Command: LE Set Random Address (0x08|0x0005) plen 6   #1 [hci0] 27.302496
>        Address: 15:60:F2:91:B2:24 (Non-Resolvable)
>> HCI Event: Command Complete (0x0e) plen 4                 #2 [hci0] 27.419117
>      LE Set Random Address (0x08|0x0005) ncmd 1
>        Status: Success (0x00)
> < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7  #3 [hci0] 27.419244
>        Type: Active (0x01)
>        Interval: 11.250 msec (0x0012)
>        Window: 11.250 msec (0x0012)
>        Own address type: Random (0x01)
>        Filter policy: Accept all advertisement (0x00)
>> HCI Event: Command Complete (0x0e) plen 4                 #4 [hci0] 27.420131
>      LE Set Random Address (0x08|0x0005) ncmd 1
>        Status: Success (0x00)
> < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2      #5 [hci0] 27.420259
>        Scanning: Enabled (0x01)
>        Filter duplicates: Enabled (0x01)
>> HCI Event: Command Complete (0x0e) plen 4                 #6 [hci0] 27.420969
>      LE Set Scan Parameters (0x08|0x000b) ncmd 1
>        Status: Success (0x00)
>> HCI Event: Command Complete (0x0e) plen 4                 #7 [hci0] 27.421983
>      LE Set Scan Enable (0x08|0x000c) ncmd 1
>        Status: Success (0x00)
> @ MGMT Event: Command Complete (0x0001) plen 4        {0x0003} [hci0] 27.422059
>      Start Discovery (0x0023) plen 1
>        Status: Success (0x00)
>        Address type: 0x06
>          LE Public
>          LE Random
> @ MGMT Event: Discovering (0x0013) plen 2             {0x0003} [hci0] 27.422067
>        Address type: 0x06
>          LE Public
>          LE Random
>        Discovery: Enabled (0x01)
> @ MGMT Event: Discovering (0x0013) plen 2             {0x0002} [hci0] 27.422067
>        Address type: 0x06
>          LE Public
>          LE Random
>        Discovery: Enabled (0x01)
> @ MGMT Event: Discovering (0x0013) plen 2             {0x0001} [hci0] 27.422067
>        Address type: 0x06
>          LE Public
>          LE Random
>        Discovery: Enabled (0x01)
> 
> Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
> ---
> net/bluetooth/hci_event.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
> 
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 609fd6871c5a..b349fc2b4123 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -3071,6 +3071,19 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb,
> 	*opcode = __le16_to_cpu(ev->opcode);
> 	*status = skb->data[sizeof(*ev)];
> 
> +	if (*opcode != HCI_OP_RESET && !hci_sent_cmd_data(hdev, *opcode)) {
> +		/* Some buggy controllers generate an extra command complete
> +		 * event for the previous command after a new commend was
> +		 * already sent, so if the opcode does not match the last sent
> +		 * command we should ignore this event.
> +		 *
> +		 * HCI_OP_RESET is a special case handled in
> +		 * hci_req_cmd_complete.
> +		 */
> +		bt_dev_err(hdev, "unexpected CC event opcode 0x%4.4x", *opcode);
> +		return;
> +	}
> +

the return; here is not going to fly. You need to go through the switch statement since it actually does bookkeeping for a lot of features even if the command is not executed by hci_req and instead comes from user space or someone else.

What is the problem with handling this in hci_req_cmd_complete. We have hci_sent_cmd_data check there already. Lets extend this to ensure that command complete for random commands don’t trigger us to continue with the request.

Regards

Marcel
=?UTF-8?q?Jo=C3=A3o=20Paulo=20Rechi=20Vita?= April 25, 2019, 6:37 a.m. UTC | #2
On Wed, Apr 24, 2019 at 2:45 PM Marcel Holtmann <marcel@holtmann.org> wrote:
>
> Hi Joao Paulo,
>
> > This commit makes the kernel not send the next queued HCI command until
> > a command complete arrives for the last HCI command sent to the
> > controller. This change avoids a problem with some buggy controllers
> > (seen on two SKUs of QCA9377) that send an extra command complete event
> > for the previous command after the kernel had already sent a new HCI
> > command to the controller.
> >
> > The problem was reproduced when starting an active scanning procedure,
> > where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
> > command. When this happends the kernel ends up not processing the
> > command complete for the following commmand, LE_SET_SCAN_PARAM, and
> > ultimately behaving as if a passive scanning procedure was being
> > performed, when in fact controller is performing an active scanning
> > procedure. This makes it impossible to discover BLE devices as no device
> > found events are sent to userspace.
> >
> > This problem is reproducible on 100% of the attempts on the affected
> > controllers. The extra command complete event can be seen at timestamp
> > 27.420131 on the btmon logs bellow.
> >
> > Bluetooth monitor ver 5.50
> > = Note: Linux version 5.0.0+ (x86_64)                                  0.352340
> > = Note: Bluetooth subsystem version 2.22                               0.352343
> > = New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0)               [hci0] 0.352344
> > = Open Index: 80:C5:F2:8F:87:84                                 [hci0] 0.352345
> > = Index Info: 80:C5:F2:8F:87:84 (Qualcomm)                      [hci0] 0.352346
> > @ MGMT Open: bluetoothd (privileged) version 1.14             {0x0001} 0.352347
> > @ MGMT Open: btmon (privileged) version 1.14                  {0x0002} 0.352366
> > @ MGMT Open: btmgmt (privileged) version 1.14                {0x0003} 27.302164
> > @ MGMT Command: Start Discovery (0x0023) plen 1       {0x0003} [hci0] 27.302310
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> > < HCI Command: LE Set Random Address (0x08|0x0005) plen 6   #1 [hci0] 27.302496
> >        Address: 15:60:F2:91:B2:24 (Non-Resolvable)
> >> HCI Event: Command Complete (0x0e) plen 4                 #2 [hci0] 27.419117
> >      LE Set Random Address (0x08|0x0005) ncmd 1
> >        Status: Success (0x00)
> > < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7  #3 [hci0] 27.419244
> >        Type: Active (0x01)
> >        Interval: 11.250 msec (0x0012)
> >        Window: 11.250 msec (0x0012)
> >        Own address type: Random (0x01)
> >        Filter policy: Accept all advertisement (0x00)
> >> HCI Event: Command Complete (0x0e) plen 4                 #4 [hci0] 27.420131
> >      LE Set Random Address (0x08|0x0005) ncmd 1
> >        Status: Success (0x00)
> > < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2      #5 [hci0] 27.420259
> >        Scanning: Enabled (0x01)
> >        Filter duplicates: Enabled (0x01)
> >> HCI Event: Command Complete (0x0e) plen 4                 #6 [hci0] 27.420969
> >      LE Set Scan Parameters (0x08|0x000b) ncmd 1
> >        Status: Success (0x00)
> >> HCI Event: Command Complete (0x0e) plen 4                 #7 [hci0] 27.421983
> >      LE Set Scan Enable (0x08|0x000c) ncmd 1
> >        Status: Success (0x00)
> > @ MGMT Event: Command Complete (0x0001) plen 4        {0x0003} [hci0] 27.422059
> >      Start Discovery (0x0023) plen 1
> >        Status: Success (0x00)
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> > @ MGMT Event: Discovering (0x0013) plen 2             {0x0003} [hci0] 27.422067
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> >        Discovery: Enabled (0x01)
> > @ MGMT Event: Discovering (0x0013) plen 2             {0x0002} [hci0] 27.422067
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> >        Discovery: Enabled (0x01)
> > @ MGMT Event: Discovering (0x0013) plen 2             {0x0001} [hci0] 27.422067
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> >        Discovery: Enabled (0x01)
> >
> > Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
> > ---
> > net/bluetooth/hci_event.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index 609fd6871c5a..b349fc2b4123 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -3071,6 +3071,19 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb,
> >       *opcode = __le16_to_cpu(ev->opcode);
> >       *status = skb->data[sizeof(*ev)];
> >
> > +     if (*opcode != HCI_OP_RESET && !hci_sent_cmd_data(hdev, *opcode)) {
> > +             /* Some buggy controllers generate an extra command complete
> > +              * event for the previous command after a new commend was
> > +              * already sent, so if the opcode does not match the last sent
> > +              * command we should ignore this event.
> > +              *
> > +              * HCI_OP_RESET is a special case handled in
> > +              * hci_req_cmd_complete.
> > +              */
> > +             bt_dev_err(hdev, "unexpected CC event opcode 0x%4.4x", *opcode);
> > +             return;
> > +     }
> > +
>
> the return; here is not going to fly. You need to go through the switch statement since it actually does bookkeeping for a lot of features even if the command is not executed by hci_req and instead comes from user space or someone else.
>

Ok, so you want the same approach I had before but not bound to a
quirk. Sorry, when you previously said "ignore this command complete
and not start ahead with the next command" I thought you meant to also
avoid going through that switch statement.

> What is the problem with handling this in hci_req_cmd_complete. We have hci_sent_cmd_data check there already. Lets extend this to ensure that command complete for random commands don’t trigger us to continue with the request.
>

I'll send an updated revision soon.

--
João Paulo Rechi Vita
http://about.me/jprvita

Patch
diff mbox series

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 609fd6871c5a..b349fc2b4123 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -3071,6 +3071,19 @@  static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb,
 	*opcode = __le16_to_cpu(ev->opcode);
 	*status = skb->data[sizeof(*ev)];
 
+	if (*opcode != HCI_OP_RESET && !hci_sent_cmd_data(hdev, *opcode)) {
+		/* Some buggy controllers generate an extra command complete
+		 * event for the previous command after a new commend was
+		 * already sent, so if the opcode does not match the last sent
+		 * command we should ignore this event.
+		 *
+		 * HCI_OP_RESET is a special case handled in
+		 * hci_req_cmd_complete.
+		 */
+		bt_dev_err(hdev, "unexpected CC event opcode 0x%4.4x", *opcode);
+		return;
+	}
+
 	skb_pull(skb, sizeof(*ev));
 
 	switch (*opcode) {