730

Adventures in OTA Land

  1. cyanogen Steve Kondik Apr 29, 2015

    cyanogen, Apr 29, 2015 :
    Hey everyone.. I wanted to write a little bit about the crazy adventure we had when rolling out Cyanogen OS 12. This was possibly the most troublesome OTA we've done for any device yet, not because of the software (which is pretty damn good IMHO), but because of the process required to actually install it when coming from CM11. This is pretty technical stuff, but I wanted to be transparent about it and mostly get it off my chest because it was as frustrating for us as it was for some of you.

    Moving up to L was a big project. This was only the second time in the history of CM where we decided to "start over" using a clean AOSP + CAF base, reevaluate our features vs. the new stuff Google did for L, and bring all of our stuff back in. We did this when ICS came out too. It's a bit more than a "rebase" since large amounts of code had to be refactored or redesigned due to the extensive platform changes in L. It's good to do this from time to time when dealing with an upstream like AOSP, because you can make sure your code stays "fresh" and works well with the system. It took longer than we had anticipated, but we did the bulk of the work in the open so that diehards could use the nightly builds and so that other ROMs could build off of us too while we got it production ready.

    With all that out of the way, we moved onto certification. And of course, this was not a "fire and forget" situation either as it had been in the past. The first issue we hit was with the camera stack. While we have our own camera application in CM, the factory team owns the actual imaging stack as it integrates a number of third-party products that we surface as features like Clear Image, Smart Scene, and RAW capture. Unfortunately that stack was designed for Kitkat and doesn't pass certification. While we have our own stack that did pass, it didn't have these third-party features and we wanted to avoid axing these awesome features at all costs. So we got that sorted out and while the solution isn't perfect, the quality and feature set is still good as it has been. Simultaneously, Google released a new test suite which "fixed" one test which you were previously "allowed" to fail in the DRM video playback tests. As it turns out, the reason we failed it wasn't actually the reason that failures were waived, and we discovered that we had a bug in our extended media codec support (I lost a lot of sleep over this one). With these out of the way, we decided to ship.

    It's no secret that we use a slow rollout system. We do this so we can monitor the feedback you guys post, and also view diagnostic data that we get from the process. We found an issue where Play Services could wreck your battery, and we also found that encrypted devices were failing to upgrade. We also found that some alternative recoveries would fail to install the update, or otherwise leave the device in an unusable state. This was serious enough that we stopped the rollout and resolve the issues before causing any more carnage. With the Play Services issue fixed, and the upgrade process for encrypted devices all set, we started the rollout again. We started getting feedback that people weren't able to upgrade their encrypted device. This made no sense at first- we tested this a hundred times end-to-end and could not reproduce the issue at all. Fortunately, we were able to locate a user in Seattle (where our engineering office is) who had the issue and he brought the device down to use to have a look at. After much head scratching, what we found was a highly demonic bug lurking in the "uncrypt" code. Traditionally, the "/cache" partition is used to download OTA files before flashing them, but this was a big update that was much bigger than the small cache partition on the Bacon. The solution to this is to download it to the /data partition, but the /data partition is encrypted and you can't read it from the recovery system. Google came up with a clever solution called "uncrypt" which runs in Android before shutting down after downloading the OTA file. Uncrypt rewrites the decrypted file back into the raw partition and creates a "block map" which the recovery system can use to reconstruct the file without having to actually mount the encrypted storage. Sounds crazy, but it works well. Well, it does now. After a bit more lost sleep, our engineers found an integer overflow in the uncrypt code which manifested when the /data partition is full enough that would cause this block map to be inaccurate, so recovery would fail to verify the downloaded file (which was random encrypted data). Mystery solved!

    We restarted the rollout today and plan to have it out to everyone by the end of the week.

    We're also working hard to get 12.1 rolled out in the coming weeks which brings in the new changes from Android 5.1.1 as well as a few new features of our own like LiveDisplay and of course bugfixes.
     

    #1
    Deepak.Si, #tihor, rbagdai and 727 others like this.
  2. Plenkske Dutch POC Assistant Head Moderator Apr 29, 2015

    Plenkske, Apr 29, 2015 :
    Wauw, great explanation and I can really understand your problems...
    (I'm a software engineer myself)
     

    #2
  3. FayeMarkV Ice Cream Sandwich Apr 29, 2015


    #3
  4. cyanogen Steve Kondik Apr 29, 2015

    cyanogen, Apr 29, 2015 :
    Yeah, this was a rough one. Rolling out an OTA is tough- you have a lot of unexpected edge cases that come up that are impossible to plan for. The best thing you can do test a lot, and be super proactive about monitoring issues when it actually hits devices. We certainly learned a lot.
     

    #4
  5. floxigen Gingerbread Apr 29, 2015


    #5
    th0rr and Jasmohit like this.
  6. sun_rajan Gingerbread Apr 29, 2015

    sun_rajan, Apr 29, 2015 :
    Great work. appreciate your efforts. As a user we always expect some information on the progress.

    Thanks for the update and keep up the good work.
     

    #6
    CasperTFG and Raghav Kishan like this.
  7. chinny562 Gingerbread Apr 29, 2015

    chinny562, Apr 29, 2015 :
    Reserved.

    Awesome! Just finished reading it and I'm excited for another update! Thanks a lot!
     

    #7
  8. SaadHusain Donut Apr 29, 2015

    SaadHusain, Apr 29, 2015 :
    Wow. That was a good read. Thanks for the detailed and technical information.
    I know many have been waiting for the update, and even though there was a lot of discontent, this post really highlights the struggle that you guys went through. Thanks for doing what you do, and know that it is highly appreciated. +1
     

    #8
  9. Pradeep_M Froyo Apr 29, 2015

    Pradeep_M, Apr 29, 2015 :
    High Five to team for marvelous effort, But the major concern is these things should have been informed in a timely manner so every one will in same page and less panic and hoax.

    Anyhow great work and good update, keep the good work. we believe in you.
     

    #9
    sandeep_khopade likes this.
  10. PRK.R KitKat Apr 29, 2015

    PRK.R, Apr 29, 2015 :
    @cyanogen Thanks for the detailed description. People become impatient not knowing the hardships and hurdles in the way to develop such updates.

    Thanks to you and your team for your efforts.:)
     
    Last edited: Apr 29, 2015

    #10
    Deyannn, 201097 and cabhay like this.
  11. tttl Ice Cream Sandwich Apr 29, 2015

    tttl, Apr 29, 2015 :
    It is good to be transparent.

    Thanks for your update (although it should come earlier)
     

    #11
    Deyannn, tiede, sudhakarjha and 3 others like this.
  12. Sudheendrakv Cupcake Apr 29, 2015


    #12
    bakerbarber likes this.
  13. m313m Donut Apr 29, 2015

    m313m, Apr 29, 2015 :
    Thank you for all the explanations. Everyone who is a software engineer understands you :)
     

    #13
  14. cyanogen Steve Kondik Apr 29, 2015

    cyanogen, Apr 29, 2015 :
    It's usually better to get it done and talk about it afterwards when the problems are resolved and fully understood. Had I given a play-by-play during the process I would have gotten crucified :)
     

    #14
  15. 98schaeffer Eclair Apr 29, 2015


    #15
    viratsaurav, Uffie, otto2 and 2 others like this.
  16. Halino Cupcake Apr 29, 2015

    Halino, Apr 29, 2015 :
    very thank you for this post, the delay is acceptable if there is a valid reason as you explained to us.
     

    #16
    lmvzONE and Rasselok123 like this.
  17. aswarth7 Froyo Apr 29, 2015


    #17
  18. rcrdBrt Cupcake Apr 29, 2015


    #18
    newcop98 likes this.
  19. ashishgarg Gingerbread Apr 29, 2015

    ashishgarg, Apr 29, 2015 :
    Thanks a lot for the COS 12 update. Although the update is awesome but there are a few problems that are really critical and need to be addressed ASAP.

    1. With flip cover ON, the phone vibrates only once whenever a call comes, although the ringing is fine. This is problematic in case of Silent mode. We tend to miss a lot of calls in silent mode due to this :(:(

    2. The contacts app is still crashing sometimes.

    3. Audio FX does not work all the times. Rebooting fixes the issue temporarily. Sometimes even Google Music does not play any track....


    Please fix these issues ASAP, especially the vibration issue, pleaseeeeee!!!!!!
     

    #19
  20. venkataravi Jelly Bean Apr 29, 2015

    venkataravi, Apr 29, 2015 :
    Is this official news by Cyanogen? Is he really Steve Kondik. Can someone confirm this.
     

    #20