2
Bluetooth crash and data corruption - Snapdragon 865 based devices

  1. OliverDeLange
    Donut Feb 1, 2021

    OliverDeLange , Feb 1, 2021 :
    We have noticed lots of our customers with Snapdragon 865 based phones experiencing bluetooth issues including a crash and data corruption.

    I have been through the troubleshooting steps with OnePlus customer support, and even sent the device off to a service center to verify it wasn't a hardware issue. While this was in process, we did a lot more investigation into the issue and we think the issues lies with the SD 865 bluetooth firmware.

    OnePlus customer support suggested i raise the issue here so OnePlus developers can investigate and resolve the issue.

    I managed to reproduce both issues on a OnePlus 8T, and the logcat logs are below. To reproduce, I ran our app which connects to 2 BLE peripherals for a long duration, and after 5hr 30 min the bluetooth crashed.

    The frequency of this crash is variable. For example, we have three customers using a OnePlus 8 Pro. The first experiences the crash in 80% of their sessions. The second in 15%. The third hasn't experienced the crash at all.

    Unfortunately i didn't manage to sniff ble traffic over the air to verify the data corruption, but i will do this when the device is returned to me.

    The crash and the data corruption seem to be linked, in that when the crash occurs, the data is corrupted. Although not all data corruption cases, experience the crash.

    Additionally, its worth pointing out that this crash happens on other SD 865 based phones from other manufacturers, eg Samsung S20 FE, S20+, S20 Ultra, LG V60. I have reported to samsung on a public forum if you wish to see the crash logs from the samsung device to compare. I also have a support case with LG via email which I cannot share.

    I also tried emailing qualcomm directly at 'support.cdmatech@qualcomm.com' but I don't expect a response, as they don't seem to offer support to consumers of devices with their chips in.


    I have a bug report from the OnePlus 8T which i can share with developers. Although I think the most important bit is the dumpstate for bluetooth_manager, which shows the multiple BLE crashes on the device:
    DUMP OF SERVICE bluetooth_manager:
    Bluetooth Status
    enabled: true
    state: ON
    address: 5C:17:CF:88:EE:DB
    name: OnePlus 8T
    time since enabled: 00:01:54.393

    Enable log:
    12-29 15:15:28 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:27:14 Disabled due to APPLICATION_REQUEST by com.android.systemui
    12-29 15:27:18 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:27:29 Enabled due to APPLICATION_REQUEST by com.android.systemui
    12-29 15:27:52 Disabled due to CRASH by android
    12-29 15:27:52 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:27:52 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:35:21 Disabled due to CRASH by android
    12-29 15:35:21 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:35:21 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:35:33 Disabled due to CRASH by android
    12-29 15:35:33 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:35:33 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:35:49 Disabled due to CRASH by android
    12-29 15:35:49 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:35:49 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:37:32 Disabled due to CRASH by android
    12-29 15:37:32 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:37:32 Enabled due to APPLICATION_REQUEST by com.google.android.gms
    12-29 15:42:06 Disabled due to CRASH by android
    12-29 15:42:06 Enabled due to APPLICATION_REQUEST by com.google.android.gms

    Bluetooth crashed 12 times
    12-29 14:46:29
    12-29 14:49:49
    12-29 15:06:19
    12-29 15:07:35
    12-29 15:11:14
    12-29 15:15:28
    12-29 15:27:52
    12-29 15:35:21
    12-29 15:35:33
    12-29 15:35:49
    12-29 15:37:32
    12-29 15:42:06



    Logcat output from BLE crash:
    12-29 14:33:54.461 11926 11945 W bt_hci_packet_fragmenter: reassemble_and_dispatch got continuation for unknown packet. Dropping it.
    12-29 14:33:54.461 11912 11985 E vendor.qti.bluetooth@1.0-uart_controller: OnDataReady: Invalid hci packet type byte received 0x0, invalid_bytes_counter_ = 0
    12-29 14:33:54.461 11912 11985 E vendor.qti.bluetooth@1.0-uart_controller: OnDataReady: Invalid hci packet type byte received 0x0, invalid_bytes_counter_ = 1
    12-29 14:33:54.461 11912 11985 D vendor.qti.bluetooth@1.0-uart_controller: SsrCleanup: SSR triggered due to 18 skip sending special buffer
    12-29 14:33:54.461 11912 11985 D vendor.qti.bluetooth@1.0-uart_controller: ReportSocFailure: reason 18
    12-29 14:33:54.463 11912 11985 I vendor.qti.bluetooth@1.0-logger: BtPrimaryCrashReason:Invalid HCI cmd type received
    12-29 14:33:54.463 11912 11985 I vendor.qti.bluetooth@1.0-logger: BtSecondaryCrashReason:Default
    12-29 14:33:54.463 11912 11985 I vendor.qti.bluetooth@1.0-logger: TS for SoC Crash:Tue Dec 29 14:33:54 2020
    12-29 14:33:54.463 11912 11985 I vendor.qti.bluetooth@1.0-logger: FrameCrashEvent: for primary 18 - secondary 0 crash reason with TS:Tue Dec 29 14:33:54 2020
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-uart_controller: SendCrashPacket send crash reasons to the client
    12-29 14:33:54.463 11926 11988 E bt_btm : BTM Event: Vendor Specific crash event from controller
    12-29 14:33:54.463 11926 11988 E bt_btm : decode_crash_reason: PrimaryCrashReason:Invalid HCI cmd type received
    12-29 14:33:54.463 11926 11988 E bt_btm : decode_crash_reason: SecondaryCrashReason:Default at time Tue Dec 29 14:33:54 2020
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-uart_controller: SendBqrRiePacket sending vendor specific crash reason to the client
    12-29 14:33:54.463 11912 11985 I vendor.qti.bluetooth@1.0-logger: FrameBqrRieEvent: for crash reason code :12
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-uart_controller: ReportSocFailure send H/W error event to FM/ANT/BT client
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-data_handler: OnPacketReady: packet discarded and not handled
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-data_handler: OnPacketReady: discarded packet[0] = 0x1a
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-data_handler: OnPacketReady: discarded packet[1] = 0x01
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-data_handler: OnPacketReady: discarded packet[2] = 0x0f
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-data_handler: OnPacketReady: packet discarded and not handled
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-data_handler: OnPacketReady: discarded packet[0] = 0x1c
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-data_handler: OnPacketReady: discarded packet[1] = 0x01
    12-29 14:33:54.463 11912 11985 D vendor.qti.bluetooth@1.0-data_handler: OnPacketReady: discarded packet[2] = 0x0f
    12-29 14:33:54.463 11912 11985 E vendor.qti.bluetooth@1.0-logger: Rx HW error event::Crash reason not found
    12-29 14:33:54.464 11926 11988 E bt_hci : Ctlr H/w error event - code:0xf



    For context, our customers know about the crash because the device turns bluetooth off and on again to remedy the situation, and we alert the user when bluetoooth is turned off. So they hear "Error, bluetooth is turned off", and then see their peripherals re-connecting.

    The data corruption is visible because we have a check to see when our peripheral timestamp rolls over. The check is whether the current BLE packets timestamp data is <= previous packet timestamp. Therefor when the data is corrupted, this condition is true and the session clock jumps forward 49 days. I have been able to verify that the data sent by the peripheral is not what the device sends to the app by sniffing ble traffic over the air on other devices, but not a SD 865 based one. I'm still working on confirming this.

    Hoping you guys can speak with Qualcomm about this and resolve the issue via OTA software update to the affected devices. This is affecting our customers usage of our product.
     

    #1
    Swejuggalo and G_plusone like this.
  2. G_plusone
    Nougat Feb 1, 2021

  3. OliverDeLange
    Donut Feb 1, 2021

  4. Swejuggalo
    OnePlus 9 Series Expert Community Expert Feb 1, 2021

    Swejuggalo , via OnePlus 8 Pro , Feb 1, 2021 :
    I assume you already collected logs to bughunters via logkit tool and all that.

    This is what I mean when I talk about looking for a issue beyond just the specific brand/device. Look at the SoC as a whole. Look at other brands with the same SoC. Look at the base OS version regardless of brand. Think wider.
     

    #4
  5. MRTLima
    KitKat Feb 1, 2021

    MRTLima , via OnePlus 7T , Feb 1, 2021 :
    Extremely Well wrote!!!
     

    #5
  6. Nipu_1998
    The Lab Reviewer - OnePlus 9 Series Feb 2, 2021

    Nipu_1998 , Feb 2, 2021 :
    Well explained with all the data.
    You can submit it in the feedback section and the staff will look into it
     

    #6
  7. G_plusone
    Nougat Feb 2, 2021


    #7
  8. banandhi
    Gingerbread Feb 2, 2021

    banandhi , via OnePlus 8T , Feb 2, 2021 :
    I still have the OP 5T and which is running really smooth.

    OP has started settling down, with so many software and hardware bugs, sub standard camera, OP is no more a quality brand. Thinking of switching back to Samsung - more reliable, yes little costly when compared, but a brand which has been able to sustain the hardware, software and after sales support quality.for decades

    Surely, I think, Carl pei had left the company, even though being a partner, because of this reason of product quality degradation.

    OP 8T is going to be my last phone from One Plus, a brand which is settling down and fast losing it's fan base
     

    #8
  9. OliverDeLange
    Donut Feb 2, 2021

    OliverDeLange , Feb 2, 2021 :
    Yes, this is a concern I have. I'm hoping if OP, Samsung and LG all contact Qualcomm about this, they might do something.

    Thankfully, a OP bug hunter has reached out so I'm feeling positive.
     

    #9
    G_plusone and Swejuggalo like this.
  10. banandhi
    Gingerbread Feb 2, 2021

    banandhi , via OnePlus 8T , Feb 2, 2021 :
    Sure they should, if Samsung also starts using Qualcomm processors in their flagship phones.
     

    #10
    G_plusone likes this.
  11. Swejuggalo
    OnePlus 9 Series Expert Community Expert Feb 2, 2021

    Swejuggalo , via OnePlus 8 Pro , Feb 2, 2021 :
    They do in some regions. S20 and s21 line is Snapdragon in USA for example. Like when I read about WiFi issues. I read about those in USA. Identical issues as users had here at that time.
     

    #11
    G_plusone likes this.
  12. G_plusone
    Nougat Feb 3, 2021

    G_plusone , Feb 3, 2021 :
    They do use Snapdragon processors
     

    #12
  13. banandhi
    Gingerbread Feb 3, 2021


    #13
    G_plusone likes this.
  14. OliverDeLange
    Donut Apr 15, 2021

    OliverDeLange , Apr 15, 2021 :
    So I'm re-opening this thread because this issue is still not fixed, and OnePlus have gone quiet:
    upload_2021-4-15_9-23-46.png

    I've also managed to reproduce the second issue we're seeing which is a corruption of the BLE packet payloads which happens at the same time as the BLE stack crash.

    This issue boils down to this screenshot below which shows the over the air bluetooth packets on the left, and the on device bluetooth packets on the right.
    Packet number 846866 on the left is corrupted at somepoint on the device before reaching the bluetooth stack, and our application.
    Comparing the payload timestamps, the expected value is 2119320 as seen on the left. However the bt_snoop logs, captured via OnePlus' Bluetooth Exception LogKit option shows the timestamp to have the value of 4265198064 - far too high.

    upload_2021-4-15_9-26-3.png

    Details of the corruption

    The corrupted packet appears to have partially the same payload as the previous packet, and partially nonsense.

    The previous good packets value is "03:7d:00:19:e8:ea:e5:e4:e1:e0:e3:d7:d5:ea:df:df:e4:dc:e2:e6:ea:e6:f3:e4:e9:ea:ea:ed:f2:eb:e6:e2:e2:e4:ec:e2:e3:e3:e5:ed:00:00:00:00:00:00:00:00:00:00:00:00:ae:5f:36:05:e1:fd:d7:54:9e:00:52:00:0e:08:8e:56:20:00:7d"

    The corrupt packets value is
    "03:7d:00:19:e8:ea:e5:e4:e1:e0:e3:d7:d5:ea:df:df:e4:dc:e2:e6:02:0a:10:1b:00:ed:f5:e8:ed:ed:e0:e9:e4:f1:ec:ea:e8:e9:f2:eb:e8:ec:ed:ef:f3:00:00:00:00:00:00:00:02:0a:10:18:00:00:00:00:00:00:a1:04:d7:6f:f0:c1:39:fe:0f"

    The first 20 bytes, seen in bold are shared between the two packets. This shouldn't happen because byte 2 and the last byte in the packet payload are monotonically increasing integers, added here to aid with debugging this issue.You can see how byte 2 and the last byte are both "7d" in the good packet above. We can probably ignore the actual packet payload as this may not change.

    This is backed up by the wireshark packet captures. Below is the corrupted packet as sniffed over the air, containing the expected payload (notice it starts and ends with 7f)
    upload_2021-4-15_9-33-30.png


    Below is the corrupted pack as sniffed on device using OnePlus Bluetooth Exception option within LogKit to provide bt_snoop logs (notice how it doesn't end with 7d):
    upload_2021-4-15_9-34-44.png



    In fairness to OnePlus this is a pretty niche issue, and one that is super hard to reproduce let alone capture all the required debugging info. Luckily for them, I've done all that for them, and can share all the supporting captures, log cat logs, bug reports, bt_snoop info etc etc for their development team to fix this.

    I still have a feeling its an issue with the SnapDragon 865 chip because Samsung said they've fixed the issue but we're still getting customers coming in with the same issue.

    Wish me luck.
     

    #14
    G_plusone likes this.
  15. Swejuggalo
    OnePlus 9 Series Expert Community Expert Apr 15, 2021

    Swejuggalo , via OnePlus 9 Pro Stellar Black , Apr 15, 2021 :
    This might also be good to mention and document at GitHub.
    https://github.com/OnePlusOSS/android_kernel_oneplus_sm8250 (11 branch)
    One of the issues I monitor and have described has been linked by a Dev with some possible solutions.
     

    #15
    G_plusone likes this.
  16. eastbay
    Lollipop Apr 15, 2021


    #16
  17. kamal.kv
    Honeycomb Apr 17, 2021


    #17