How to DDOS yourself
This post was written by Michael from Bump's Android team and is part of a series about lessons learned while developing Bump 2.0 for Android.
One of the key things that any website worth their salt does is collect log data and make the most of that data. This includes things like tracking what pages your users are accessing on your site, and also things like errors that happen or slow response times. The user behavior data can help you build a better product and of course monitoring errors alerts you to problems that need to be fixed. For mobile apps, this is not as easy to do since a lot is happening on a user’s smartphone instead of on a web server in your data center. Many mobile developers used 3rd party analytics services like Flurry, but since we already have a network stack at Bump, we do our own logging. One decision to make is when to send the logs to the server. We don’t want to be constantly using your data connection to send logs to Bump. We especially wouldn’t want to be doing this while you were using Bump. You shouldn’t have to wait for us to send a log file before you can send a picture to your mom.
As part of our 2.0 release of our Android app, we thought we came up with a clever way to get these logs back to our servers without ever bothering our users. You see on Android there are many events on the phone that an application can listen for and thus respond to. In particular, the Android OS will fire such an event when you plug your phone in to charge (a broadcast Intent whose action is android.intent.action.ACTION_POWER_CONNECTED if you are playing at home). Most people plug in their phones at night when they are going to bed, and thus not using their phone. This seemed like a great time to send log files to the Bump servers.
The plan was simple. The Bump app registered a BroadcastReceiver that listened for this power connected event. This caused the Bump background service to wake up and start an upload of log files. Once it finished, the service would go back to sleep. What could go wrong?
There was one obvious (at least in retrospect) thing that we took for granted. We assumed that this event would be fired exactly once when the phone was plugged in and then never again until the user’s phone was unplugged and plugged in again. This assumption turned out to not only be false, but nearly fatal for our servers.
Shortly after Bump 2.0 for Android came out, we started noticing a disturbing trend in our server traffic. We would see large numbers of phone connect to our servers and then do nothing -- except for tie up connections to our servers. Of course we first noticed this one day while we were all hard at work here in our office in Mountain View. Further we noticed that most of these “zombies” were coming from Japan and South Korea. We thought that perhaps there was something odd about the phones being sold over there or maybe with the wireless networks.
As the number of downloads of Bump 2.0 continued to climb, things got worse. We had first noticed the zombies from Japan because they showed up during the work day in California. There turned out to be many more zombies right here in the United States.They came out at night, every night, and went away in the morning. We didn’t notice them until they set off monitoring alarms in the middle of the night here. One night we had to block all connections coming from Verizon IP addresses in order to keep the zombies at bay.
This pattern of zombie phones late at night suggested that our send logs to the server strategy was to blame. So we changed that code, and no longer listened for the power connected event at all. This immediately killed off the zombies. Still it left us scratching our heads about the cause of the zombies. So we wrote a little test app. This app would simply count the number of times the power connected event was fired on a device. Then we would plug in a device for awhile and see how many events the app had counted.
We started testing devices, and we began with Verizon phones since we had seen large numbers of zombies on Verizon’s network. Most phones behaved just as they should, firing the event right when the phone was plugged in and never again until you unplugged it and then plugged back in again. But one Verizon phone we tested did not: the Verizon Fascinate. Once plugged in to charge, this phone would fire the power connected event every ten seconds or so (there actually was quite a bit of variance.) With our old log uploading code, this would cause our background service to wake up every ten seconds, shake hands with our servers, and then sit idle. In other words, it would create a zombie.
If you are familiar with Android phones, you will recognize that the Fascinate is Verizon’s version of the Samsung Galaxy S. The Galaxy S is perhaps the most popular Android phone in the world. We tested several other Galaxy S phones, and none of them had the zombie gene like the Fascinate. The presence of zombies in Japan and South Korea suggests there are phones being sold there that have the zombie gene, perhaps some other flavor of the Galaxy S. We can’t say which one for sure. We’re just glad that we can once again sleep through the night without being woken up by zombies.
Want to work with Michael and the rest of the Bump team? We are hiring: http://bu.mp/jobs.