From MongoDB to Riak

This post is written by Tim on Bump's server team

At Bump Technologies, we recently completed a significant database migration from MongoDB to Riak. Almost all of our users' data -- the lists of people they've bumped, communications sent and received, handset information, social network OAuth tokens, etc. -- had been stored in MongoDB, but if you open the app today all of these interactions will be backed by Riak.

A few months ago we had migrated our datastore containing individual communication contents from MongoDB to Riak, so these databases and general migration techniques were not entirely foreign to us, but any time you have to move data while keeping the system online, it's a unique and complicated challenge.

Development obstacles

The largest development hurdle was dealing with the structured key-document nature of MongoDB vs. the key-value nature of Riak combined with the latter's eventual consistency. At Bump we use Google's Protocol Buffers extensively, for serialization to Riak and message passing between the Bump client and the server, and between server application nodes. In addition to being an efficient encoding, it also enforces types and provides for backwards- and forwards-compatibility. MongoDB's key-document knowledge and consistency model allow it to support, e.g., atomic list appends, which we would use to add to a user's list of communications when they sent one.

Riak does not have any internal knowledge of how the Protocol Buffers we are storing are formed, and cannot guarantee the same kinds of consistency. Instead, we have to grab the value for the key, update it ourselves, and put it back in. That's easy (if a bit more verbose than what MongoDB does), but the difficulty comes when dealing with concurrency. One of our app nodes could get a user's record, then append the communication the user just sent, and save it to Riak. The database, being eventually consistent, cannot guarantee that when another app node gets the record it contains the communication just written. If the second app node then makes changes and overwrites the key with their new value, data will be lost. 

Riak is pretty smart, though, and will save both of these values. Upon the next get, it will return the two valid values (if, that is, the node has received both!), leaving it up to the app to deal with the multiple truths. With appropriate logic (set unions, timestamps, etc) it is easy to resolve these conflicts, but dealing with this everywhere a database fetch happens is cumbersome. Instead, we've written a thousand-line Haskell program that acts as an interface between the app nodes and the database, which exclusively deals with conflict resolution and sibling merges. Every Riak interaction is done through this tool that we've dubbed /magicd/, which guarantees the app nodes always see a consistent truth. This obviates the largest pain point of eventually consistent databases.

After evaluating our migration options, we decided to approach it on-the-fly. The other significantly considered technique was migrating in parallel; everywhere Mongo was written we would append a write to Riak, and everywhere a Mongo read was done it would be prefixed with a read from Riak, falling back to Mongo if the data had not been migrated. This two-database parallel approach was appealing in that if something horrible happened during the transition, we could revert to pre-migration behavior (i.e., drop all of the Riak code), and the data would be consistent and up-to-date in Mongo as before.

The difficulty with this approach was that, despite almost all of our database interactions being encapsulated in a class, there were a few scattered circumstances in which the database was written to outside of the class. Those instances, of course, were only a grep away, but the significant amount of code / logic duplication for those special-cased interactions was a deterrent.

Instead, we decided to migrate all of the users' data on the fly on login. With the first packet received from the handset, we block all other communication until the user has had all of its data written to Riak. It then proceeds as normal, with all other database interactions in the server codebase having been converted to use Riak exclusively. The upside of this compared to the previous parallel approach was simplicity; s/mongo/riak/g and call it a day, instead of handling two data code paths. The downside was that a rollback to Mongo in case of significant errors could lose data, either temporarily if the new data written to Riak was recoverable, or permanently if not. Despite the downside, with extensive testing we were able to ferret out any bugs that would result in data loss, and suffered none on roll out.

Post-deployment obstacles

Significant testing before rollout could nevertheless not definitively show us all of the bugs that existed. Specifically, we encountered a few errors that were only made evident with production data and traffic.

The first issue we saw after deployment was one of our Redis instances complaining about hitting its maximum connection limit. We use Redis throughout Bump's backend, performing a number of duties including queueing, session caching, and even geolocation mapping. Using it as a networked locking mechanism for migration purposes was a natural progression. The code we wrote, now included as part of the diesel async framework we use, did not appropriately close the connection in some circumstances. With that fixed, the Redis connections dropped to a normal and expected number.

The second class of issues we encountered was a product of the relaxed schema of Mongo and the strict serialization of Protocol Buffers. There's no way to enforce types for values within a Mongo document; an integer may be stored as an integer, or a string representation of the same, for example. Depending on the leniency of application code, this fast-and-loose typing may be acceptable.

Protocol Buffers are much more strict about the types they accept. If one tries to assign a string to a value defined as an int32 in the protobuf definition file, it will refuse to be serialized. This manifested itself in our code when we tried to serialize the Apple Push Notification Service token; in some circumstances, it contained characters that broke when being parsed as a Protocol Buffers string; changing that type to bytes fixed the issue. This modification of the definition file preserved backwards-compatability because strings and bytes are stored in serialized format in the same way.

Unlike JSON/BSON where there is no defined schema and one can throw in any values they desire, Protocol Buffers can specify required fields. A object that must have a user_id, for example, cannot be serialized without that field set. This does wonders when doing client-server communication or serialization to a database, for contracts can be made and consistently followed. During the migration, there were a number of fields that should have been set in Mongo but were not. Luckily, they were comparatively menial values -- last seen Bump version number and locale for a user, for example -- but their lack of existence in Mongo caused their migration pipeline to break. Setting empty locales and 0.0.0 version numbers was deemed a satisfactory resolution considering the importance of the data.

With the above issues solved, our Python processes stopped spewing hundreds of exceptions, and things on the app node side seemed to be cooling down. At this point, we turned our attention to Riak load; we started to see latencies rise to concerning levels. In addition, we started receiving user reports that database-heavy operations were acting slower than they had prior. Further investigation showed that our magicd processes were pegging the CPU, so as a temporary fix, we increased the number of threads the process would spawn. Benchmarking revealed that magicd was serializing and deserializing every protobuf that went through it; this was only necessary when siblings existed and needed to be resolved, so fixing it so deserialization only occurred when required reduced load to a more manageable level.

While this helped magicd CPU requirements, we were still seeing elevated Riak latencies. Investigating the logs showed some particularly abnormal behavior on one node in particular; upon sshing in and poking around, it became apparent that one of its disks had failed. Shutting down the riak daemon on that node caused to latencies fall back to normal levels. After getting the disk issues resolved, we were able to bring the node back online. Thankfully due to Riak's destributed nature, a missing node in the cluster was not a problem; we still saw great performance.
Post-Mortem

We decided to move to Riak because it offers better operational qualities than MongoDB. Despite the migration hiccups, we're pleased with the process and the result. Now we no longer care if one of the nodes kernel panics in the middle of the night; as has happened a few times already. Nagios will email us instead of page us, and over coffee the next morning we'll fire up IPMI, reboot the machine, and Riak will read-repair as necessary. No longer will we have to do any master-slave song and dance, nor will we fret about performance, capacity, or scalability; if we need more, we'll just add nodes to the cluster. 

If you have any questions about our experience, or want to work with us on this joyful stack (we're hiring!) -- shoot me an email: timdoug@bu.mp.

 

git for dropbox users: Don't be afraid!

git is awesome

We use git at Bump for all sorts of things. Recently, we’ve started converting our designers and product managers. You can think about git like Dropbox: it takes files on your computer, puts them in the cloud, and lets other people work on them too. But it has some pretty rocking advantages:

Collaboration

Like Dropbox, git lets multiple people work on the same files at the same time. This is great when many people are working on a document together, or a group of people are creating assets for a website. Everyone can be contributing to the same folder and files, without fear of overwriting eachother’s work, something Dropbox can scew up.

Time Machine

Even if you don’t have a team of people, git has some awesome-sauce. It can act like a time machine, allowing you go and see what you did last week, or even last year, help you recover files you deleted, and keep track of progress over time. It has powerful tools to let you see what you’ve changed over time.

And More

git is also great for backing things up, so you don’t lose them. It can let you learn about how you work: when do you do your best work? How often do you change everything? There are tools to show you visual differences between versions of images, and all sorts of things.

but

git is awesome… but conceptually it’s a leap from most people’s mental models of what a file is.

There are so many wonderful reasons for non-programmers to use git when working with other people (especially programmers) and I’d like to convince you that there are even more awesome rewards down the way of git, but I won’t do that today.

My goal today:

Provide an analogy to Dropbox to help designers and product managers understand the mental model of how git works.

It might help you to think about git like Dropbox to start. It’s like Dropbox but with a whole lot more steps. Have faith (for the moment) that the more steps are worth it, and I’ll talk you through the differences.

First, a quick overview of how Dropbox works: many people can be working on the same file on their own computers, and magically they all get each other’s changes. Behind the scenes Dropbox is doing all sorts of things to make this work. With git, you end up helping do some of the behind the scenes work, and git thanks you.

As easy as 4… 5… 6!

Here is a more explicit model of how Dropbox works.

  1. You install Dropbox.
  2. Someone shares a folder with you.
  3. Your computer downloads the contents of that folder from the “Cloud”
  4. You make some changes to a file in the folder and save it.
  5. Your changes get sent up to the Dropbox “Cloud” immediately
  6. Everyone else gets your changes downloaded to their computers immediately.

Yay. This is just like git! Only git lacks some of the “immediately” magic. Steps 4, 5, and 6 don’t happen automatically: you have to tell git when you want it to do each step. Let me break it down for you.

4) Saving… are you sure?

The first difference is that git doesn’t assume that you want to send every saved change to the cloud. It makes you be explicit about what you want to send to the cloud.

Whereas Dropbox automatically uploads everything when you save it, git adds a new step: commitment. After you save a file, you must tell git that you are “committed” to this change.

To do this, you do what is called “staging.” This is really just telling git which changes you were sure about. This way, git doesn’t waste other people’s time if you make a draft you don’t like.

5) Uploading

Second difference: Dropbox uploads your files immediately. (It tries to anyway). git isn’t quite as eager to get things into the cloud.

In git, uploading to the cloud is called “pushing.” You are literally pushing the changes from your computer up in to the cloud. Only then will other people be able to download them.

6) Downloading

The next difference is downloading. Can you spot the difference? Why, that’s right! While Dropbox downloads the new changes automatically, git waits for you to tell it too.

You must ask git explicitly to “pull” the changes down from the cloud to your computer. This gives you the latest greatest copy of the documents, including everyone else’s changes

Stay strong! Keep faith that even though there is seemingly less magic, and more work, there are reasons for all this mortal toil [sic]!

Ordering, or 6… 4… 6… 5…???

git requires you to operate with a slightly different order of operations because it is more manual (slightly less magic).

Before you start changing things, you want to make sure you have the latest and greatest version with everyone else’s changes. That means the first thing you want to do is download the latest stuff from the cloud (i.e. do a pull).

Now that you have the latest, you can change it and edit it to your heart’s content. When you are happy with your changes you are ready to tell git you’re happy (i.e. commit).

You can also rinse lather and repeat step 4 many times, committing small changes before going to the next step.

But wait a second! What if Brian in marketing changed the document WHILE you were working on it? Because git doesn’t automatically download the changes, your computer (and you, by extension) doesn’t know about the new changes. So this adds another step, redownloading (i.e. doing another pull!)

Okay, now git knows about your changes, it knows about Brian’s changes, and the only thing left to do is… Push! Finally we can upload our changes to the cloud for everyone else to see and look at.

That’s the gist of git.

It’s like Dropbox, but you have to tell it to download, save, download, and upload each time. (Pull, stage & commit, pull, and push)

If you are digging it so far, let’s tackle some Advanced differences.

Advanced Differences

There are some other model differences between git and dropbox. The first are conflicts: what happens when two people change the same file. Another big one is the fact that git is “distributed,” i.e. there is no company holding the cloud for you… there can even be multiple clouds.

Here are some quick overviews of salient differences.

Time Travel

Every commit makes a little tick on a timeline. git keeps track of EVERY little tick. That means you can zoom back in time, and watch as your document changes from day 0 to present. You can see every change, along with who made it, and what there “commit message” was. Pretty cool.

git has multi-player undo You can find out exactly who made the typo, and when and then fix it really easily. (git has really awesome tools for this too).

Conflict

What happens when Brian changed that file in the cloud, while you were working on it on your computer? Dropbox takes care of it magically in the cloud, now… it’s your job. The second step 6 (re-pulling changes from the cloud) in your work flow is when you are most likely to have “merge conflicts”.

Usually git is pretty magical and figures out what to do if both you and Brian were editing the same file. For example if he changed the title of the document, and you added some foot notes, git will figure it out. However, if you change your salary to $350,000, and Brian changes it to $40, git isn’t going to know what to do.

If Brian pushed his change first, git will make you fix it. It will say: “Brian said you make $40 an hour, but you said you make $350,000, who’s right”. Your call. Fix it, save it (commit it) and push it it! (but do pull it in case Brian caught his error and changed it to $400,000 while you were typing)

Meta Comments

Every time you save (commit really) some changes, git asks you for a “commit message”. This is couple of sentences that describe the changes you made: a brief overview so that someone could review the list of changes to this document, without having to read the whole thing.

Style

Because git is a little more manual, there are a few stylistic changes suggested to make everyone happy. First of all, small changes. You should commit every time you finish changing a coherent piece. I.e. if you are changing your salary, and Brian’s, first make one commit with your increase and a commit message like “merit based salary adjustments”, then adjust Brian’s with a commit message of “balance the budget”. Good (informative) commit messages are very appreciated in git land.

Due to the nature of fixing conflicts, git also prefers text. Images are much harder for it (and you) to merge if both people change at once. So whenever possible use a text format (like html or markdown).

Fin

Well, I don’t know if I convinced you that you HAVE to use git, but hopefully you understand it a little better now. If you are already using it, hopefully now it makes a little more sense. At the very least, you are a step closer to being a friend and co-conspirator of git.

Easy partial screen modals on iOS

This post was written by Anders from Bump's iOS team.

Say you would want to present a modal view on an iPhone but you don't want it to be full screen. What do you do? Well, there is always the option of just not using presentModalViewController:animated: and performing the animations yourself which also in turn means maintaining view and controller hierarchy yourself. 

You could of course subclass UINavigationController and encapsulate all that complexity in a subclass. That would of course mean that you would need to update your code in all places that use the UINavigationController. Another similar approach would be to subclass UIView and override setFrame: to not allow the frame to be set to full screen. Again, that would mean updating all relevant views to subclass your new UIView subclass with the setFrame: logic.

A different approach which lets you maintain your code pretty much unchanged requires you to change the way UINavigationController itself behaves. The following solution lets you set the frame of your UINavigationControllers view which will then be maintained even if a modal view controller is presented.

All you need to to is call a static initializer method on this new NPNavigationController class:

[BPNavigationController setup];

Now you can set the UINavigationControllers view frame. Like so:

navigation.view.frame = CGRectMake(0.0f, 20.0f, 320.0f, 440.0f);

If at any later point you present a modal view controller in a navigation controller that modal view controller will only be as big as the frame of the navigations controllers view.

How is this magic achieved? Behold:

Who said Android apps can’t look good?

This post was written by Indy from Bump's Android team.

Today we are releasing Bump 2.4 or what we have been calling the Visual Refresh. Earlier this year we hired our first (and so far only, we're hiring) badass visual designer, Shona Dutta. Since then she’s been on a crusade to make our apps and everything else we present (including the recent infographic) look as awesome as our technology.

So when it was time for the Android app to get it’s much needed facelift we wanted to jump on it right away. Unfortunately we were still finishing up some other major features in the app, so our team’s summer intern Will Whitney, cut a branch and started churning away (If you’re an awesome Android dev, this is why we need you, there are too many things to get done and we are currently only two people). He left about a month ago, and finally in the last two weeks we had the time to finish up the work he started.

In the last 10 months we’ve learned a lot about the ins and outs and how to build a great app on Android. One of the things we learned quickly was people rightfully bagged on Android apps not looking as good as their iOS counterparts. But the reasons they pointed to were just plain wrong.

It’s a common misunderstanding that Android apps are harder to program than iOS. People say it’s because the lifecycle behind Activities (Android’s rough equivalents of iOS UIViewControllers) are hard to work with. Not true, I think reading through the documentation on that once should clear up a lot of confusion. People say it’s because the UI code is a bunch of messy xml. Not true, yes it’s xml, but it’s xml used in the way it should: declaratively and in a DRY manner. The xml based layouts are what make UI programming quite a lot nicer: one because it forces view code to be purely view code; two because simple parameter tweaking can produce great results.

In fact I joke with our iOS team that android UI is ridiculously fast to iterate on. Case and point is the global network status bar we have on top of the new app. Whenever the network goes down or is unavailable on any screen of the app a thin black bar rolls down above the whole UI (squeezing the content area below it) letting the user know what went wrong. This simple squeezing of the UI and adding a view on top of every screen took less than two hours to put together.

The real reason Android Apps historically haven’t looked good is because people haven’t put in the same care in building those apps. From a business sense that made sense in the past. Android has never been the primary platform for app users. But the rate of growth of the platform is phenomenal, and even if you believe that in general an Android user is less likely to use apps than an iOS user, the raw numbers should shock you into putting the same care into your Android app as your iOS app.

So it’s finally here: Bump 2.4 for Android. An App that finally looks as good as it works.

 

Bump's Visual Refresh for iOS

This post was written by Shona from Bump's design team.

Homescreen_compare

Bump is simple. Right? You bump, and data is transferred from one phone to another. It can be anything from your contact info to photos to apps, but the premise is basic and totally functional. This was what I figured back when I started working on the visual redesign of the app.

Three months later, my opinion has changed (kind of a lot). Bump is an incredibly complex, nuanced app that calls for a whole lot of visual interplay between screens, between elements on those screens, and across the whole application. It’s got hundreds of permutations and possibilities in what can be on the screen at any time. Fortunately I had the help of Abby, our UX designer here at Bump, when I undertook the work of giving Bump a visual “refresh.”

Our choice was to make Bump more tactile. The app is based on the physical bumping of two phones, and there’s something really magical about highlighting a physical interaction in a world where you can talk with someone without ever seeing them, and show them photos without standing next to them. We love the fact that people get a kick out of making real contact when bumping. To carry that further, I decided to explore textures, shadows, and layering in the new design. These are things we see in real life and that make the app look touchable.

Our homescreen background is paper-like; the buttons & graphics are tending toward a letter-pressed style; both of these are enormously tactile in appearance. Subtle dropshadows and glows give buttons, notes and bars the impression of popping up and out of the screen. This is all stuff we see everyday (just look at the pages of a book or the buttons on a TV). These details come from things we interact with and that we know how to use.

At Bump we’re trying to capitalize on the intuitive understanding that people have of objects, buttons,  and icons, and use them in the app to help our users get where they want to go. In the end it’s all about making Bump easy, and a pleasure, to use.

As with any large undertaking this project takes time. We’re only part-way finished with the redesign and we’re already looking ahead to a larger overhaul. But in the meantime I’m enjoying the new look we’ve injected into the app. It’s a huge step forward in making Bump even more a pleasure to use.

Want to work with Shona and the rest of the Bump team? We are hiring: http://bu.mp/jobs.

 

 

How we use Scala in Bump for Android

This post was written by Will from Bump's server team.

Since Indy's post and Michael and Indy's talk at Scala Days, many of you have been asking about how we used Scala for the Android version of Bump. I'll try to cover a little of the why (without getting to deep into the Scala v. Java battle) and then get into the how. But first, a little background.

As with many startups, we're primarily limited by the number of developers we have working on projects and our Android team was no exception. After releasing 2.0 on iOS, Jamie and I (the server team at the time) had plenty of things we wanted to do to the server, but none of them seemed as pressing as bringing Android up to parity with iOS. Knowing the network stack better than most, we decided that Jamie and I would write the Service while Jake K. and Indy would switch over from iOS and write the Application layer. The final decision ended up being that the Service would be in Scala, whereas the Application would be in Java. So then, why did Jamie and I choose Scala?

Why Scala?

  • It has a lot of functional constructs we're used to and like. We use Haskell at Bump for various projects and we're both heavy users of the list comprehensions, first-class functions and other functional constructs that are available in Python. Scala let us express problems in a way we were used to.
  • Scala works very well with existing Java frameworks. Android is, of course, one big Java framework and we were going to have to interface with it a lot. A number of people asked, given the above, why not use Clojure--this is why.
  • An Android Service can withstand higher latency. Scala runs a little slower than Java (I've heard about 6x in benchmarks, but I'd bet it's worse in Dalvik), but this was something we could tolerate. If an application takes 50ms extra to render a cell in a list, the user will most certainly notice. If, however, the Service takes 50ms more process something it had to fetch from the network, nobody cares--it took 10x longer to do the network fetch anyway.

Why not Scala?

With all that out of the way, it's worth taking a minute to talk about some of the problems we ran into.

  • Scala does have public member variables. To support some of the syntactic changes that Scala makes, it replaces public member variables with accessor functions. In Scala, when you do something like foo.bar, it implicitly calls the function bar() (that returns the value of the private member variable bar) on foo. Java, of course, doesn't do this. So, when you are using Scala classes in Java code, everything becomes a function and you have to explicitly refer to foo.bar() everywhere. An easy problem to deal with, but it requires the programmer to remember whether something they are using is a Scala class or a Java one.
  • The above creates a more subtle problem with Android that I wanted to call out specifically--Parcelables. Android's Parcelable interface depends on being able to access the public member variable CREATOR on a class that implements Parcelable. If you create the class in Scala, then, of course, CREATOR ends up being a private member variable with a matching accessor function. There's no easy way around this problem, but thankfully Stéphane Micheloud committed a patch to the Scala trunk to solve this problem. This does mean if you're planning on using Parcelables, you're going to need to use a build of Scala that includes this patch.
  • Scala doesn't have an equivalent concept to public static final variables in interfaces (and honestly, it's a bit of a mystery to me why they exist in Java as well). This problem can easily be solved by creating a Java wrapper class that maps values from the interface to member variables on the class. For example,

import com.some.interface.SomeConsts;

class Consts {


      public static final CONST_A = SomeConsts.CONST_A;
      public static final CONST_B = SomeConsts.CONST_B;
      ...
}

  • To support anonymous functions and some of the other language constructs that Scala adds, it ends up creating a far larger number of classes than Java. Generally, this shouldn't be a problem, however, Android appears to have a limit on the number of classes that a single application can have. I don't have details on exactly what this limit is, but we did find that inheriting from the Scala classes pushed that number up a lot. At one point, we tried inheriting from the Map class and that pushed us over the edge. In general, inheritance can be rewritten as a standard class with the proposed parent class as a member variable, but it's certainly not as clean.

How Scala?

Before I get too deep into the specifics of how we did it, I want to point everyone to Stéphane Micheloud site on putting Scala on Android. Not only is this site a great resource, but Stéphane was immesurably helpful in getting us up and running.

Our build process consists of three steps, the first compiled the Google Protobufs that we use to communicate between our client and the server, the second invoked the Scala compiler to build the Service and finally we went back to the Java compiler to build the Application level code. I've attached the full versions of our build.xml and build-scala.xml (a modified version of the one on Stéphane's site), but I'll call out a few things here that might be of interest.

<property name="extensible.classpath" value="${scala-library.jar}" />

If you're accessing any Scala objects in Java, you're going to need to let the Java compiler know how those objects work. Unfortunately, there's no mechanism for extending properties in ant, so when we started working on a new feature that involved Google Maps, we had to change this to:

<property name="extensible.classpath" value="${scala-library.jar};${path.to.maps.jar}" />

<target name="-service-pre-compile"


        description="Compiled the protocol buffers"
        depends="-aidl, -resource-src">
  <javac encoding="ascii" target="1.5" debug="true" extdirs=""
         includes="src/com/google/**,src/com/bump/proto/**"
         destdir="${out.classes.absolute.dir}"
         bootclasspathref="android.target.classpath"
         verbose="${verbose}">
    <src path="." />
    <classpath>
      <fileset dir="${jar.libs.absolute.dir}" includes="*.jar" />
    </classpath>
  </javac>
</target>

This is our compile step for protocol buffers. It looks just like the version in the Android SDK, except we've given it a new name and restricted the files it's compiling with the includes statement.

<target name="scala-compile" depends="-service-pre-compile"
    description="Compiles project's .scala files into .class files"
    if="myapp.containsScala" unless="do.not.compile">
    <condition property="logging" value="verbose" else="none">
        <istrue value="${verbose}" />
    </condition>
    <property prefix="scala"
        resource="compiler.properties"
        classpathref="scala.path" />
    <echo
        message="Scala version ${scala.version.number} - http://scala-lang.org"
        level="info" taskname="fsc" />
    <fsc
        srcdir="." includes="gen/**,src/com/bump/core/**,src/com/bump/util/*.java,src/com/bump/*.java"
        destdir="${out.classes.dir}"
        bootclasspathref="android.target.classpath"
        deprecation="yes"
        logging="${logging}" addparams="${scalac.addparams}">
        <classpath>
            <pathelement location="${scala-library.jar}" />
            <pathelement location="${out.classes.dir}" />
            <fileset dir="${jar.libs.absolute.dir}" includes="*.jar" />
        </classpath>
    </fsc>
    <touch file="${out.dir}/classes.complete" verbose="no"/>
</target>

The is the step to compile the scala code (from build-scala.xml). One thing we changed from Stéphane's version is we're using fsc in instead of scalac. fsc (Fast Scala Compiler) acts just like the regular Scala compiler, but it's faster. It does, however, do some caching to speed things up, so if you're trying to do a clean build, you need to tell fsc (fsc -reset).

Jamie uses vim and I use emacs, and both of just like the command line too much to use Eclipse, so this is where I'm going to have to stop. Since I know many of you use Eclipse, I'll call out Indy and Jake to talk about how they handled the Scala code in Eclipse.

Want to work with Will and the rest of the Bump team? We are hiring: http://bu.mp/jobs.

Introducing Stud

This post was written by Jamie from Bump's server team.

So a few weeks ago, when the zombie army lurched its way toward our servers, a long-neglected scaling issue likewise rose from the dead.

Like all responsible companies, we strive to keep our users' data protected; so, we tunnel our proprietary socket protocol in TLS. Unfortunately, our use of other-than-HTTP immediately removes the usual candidate for TLS/SSL termination--nginx--from the running.

So, when we were designing our backend systems last summer, we grabbed the only serious open source contender for adding TLS to an arbitrary socket: the venerable stunnel project. Stunnel has three "threading models": ucontext, threads, and processing.We decided to try each and see what worked best.

We had high hopes for ucontext, since the low memory overhead of setjmp/longjmp-based coroutines would work really well for our particular application. As opposed to something like HTTP, which is typically comprised of many short-lived, active connections, Bump holds open low-bandwidth connections to a very high number of clients at once. So minimizing the memory penalty of each connected client was important to us.

But the ucontext threading module of stunnel performed surprisingly poorly. We could saturate one core with only ~1k mostly-idle concurrent connections--and there was no graceful approach for utilizing more than one core without introducing another load balancing layer. This may very well have been due to some of the flaws the folks at RethinkDB recently documented on their blog

Next, we tried using the pthread-based threaded mode. As programmers, we trust threads about as far as we can throw them--but admit that written skillfully and very carefully, threaded programs can perform excellently. Unfortunately, when put under load, stunnel threw all kinds of nasty assertions and segfaults in threaded mode.

So prefork was looking pretty good at this point. We happened to be using machine with a ridiculous amount of memory (north of 80GB), and copy-on-write semantics meant that each child process would only use around 2MB of resident memory. Furthermore, under the concurrency estimates we had at that time (as well as 12-month projections), the OS scheduler would handle things quite nicely--and to top it off, fork-based multiprocessing is an extremely robust way to run servers, as Apache, PostgreSQL, Unicorn, etc, have proven time and time again.

All in all, we were feeling pretty pleased with ourselves until those pesky zombies *octupled* our concurrent connections overnight.

(If you were wondering, by the way, where process scheduling breaks down--at least, on a mostly-stock Linux kernel--the magic number seems to be around 10-15k processes. By 20k, you are basically doing no actual work--only context switching. And fork() can take upwards of 5 seconds.)

I suppose we *could* have just fired up lots of servers with lots of RAM and spent our way out of the problem. But the whole situation was starting to get a little ridiculous. So we decided to try our hand at a solution.

The result is stud. Stud is the Scalable TLS Unwrapping Daemon (and yes, we worked very hard to make that acronym fit). It takes the nginx model of using one process per CPU core and then doing asynchronous I/O within each process. It has a low memory overhead per secure connection and it minimizes heap allocations to the extreme. It's easy to use and easy to configure.

Since deploying stud over a month ago, memory usage on our load balancing machines is down tenfold. Load is down 80%, and TLS handshake times are back under 20ms consistently.

Here's the load on one of our TLS termination servers running stud, handling over 3,000 secure connections (including ~50 handshakes a second):

someserver:~$ netstat -na | grep ':2000' | grep ESTAB | wc -l
3337
someserver:~$ cat /proc/loadavg
0.67 0.66 0.61 3/415 18476
someserver:~$ cat /proc/cpuinfo | grep processor | wc -l 
16

We decided to open source it in case it helps anyone else out there keep the zombies at bay--it's a particularly natural partner for haproxy, which is our production deployment. Feel free to drop me a line with any feedback, file bugs on github, or submit pull requests--we'd love improvements!

Want to work with Jamie and the rest of the Bump team? We are hiring: http://bu.mp/jobs.

 

How to DDOS yourself

This post was written by Michael from Bump's Android team and is part of a series about lessons learned while developing Bump 2.0 for Android.

One of the key things that any website worth their salt does is collect log data and make the most of that data. This includes things like tracking what pages your users are accessing on your site, and also things like errors that happen or slow response times. The user behavior data can help you build a better product and of course monitoring errors alerts you to problems that need to be fixed. For mobile apps, this is not as easy to do since a lot is happening on a user’s smartphone instead of on a web server in your data center. Many mobile developers used 3rd party analytics services like Flurry, but since we already have a network stack at Bump, we do our own logging. One decision to make is when to send the logs to the server. We don’t want to be constantly using your data connection to send logs to Bump. We especially wouldn’t want to be doing this while you were using Bump. You shouldn’t have to wait for us to send a log file before you can send a picture to your mom.

As part of our 2.0 release of our Android app, we thought we came up with a clever way to get these logs back to our servers without ever bothering our users. You see on Android there are many events on the phone that an application can listen for and thus respond to. In particular, the Android OS will fire such an event when you plug your phone in to charge (a broadcast Intent whose action is android.intent.action.ACTION_POWER_CONNECTED if you are playing at home). Most people plug in their phones at night when they are going to bed, and thus not using their phone. This seemed like a great time to send log files to the Bump servers.

The plan was simple. The Bump app registered a BroadcastReceiver that listened for this power connected event. This caused the Bump background service to wake up and start an upload of log files. Once it finished, the service would go back to sleep. What could go wrong? 

There was one obvious (at least in retrospect) thing that we took for granted. We assumed that this event would be fired exactly once when the phone was plugged in and then never again until the user’s phone was unplugged and plugged in again. This assumption turned out to not only be false, but nearly fatal for our servers.

Shortly after Bump 2.0 for Android came out, we started noticing a disturbing trend in our server traffic. We would see large numbers of phone connect to our servers and then do nothing -- except for tie up connections to our servers. Of course we first noticed this one day while we were all hard at work here in our office in Mountain View. Further we noticed that most of these “zombies” were coming from Japan and South Korea. We thought that perhaps there was something odd about the phones being sold over there or maybe with the wireless networks.

As the number of downloads of Bump 2.0 continued to climb, things got worse. We had first noticed the zombies from Japan because they showed up during the work day in California. There turned out to be many more zombies right here in the United States.They came out at night, every night, and went away in the morning. We didn’t notice them until they set off monitoring alarms in the middle of the night here. One night we had to block all connections coming from Verizon IP addresses in order to keep the zombies at bay.

This pattern of zombie phones late at night suggested that our send logs to the server strategy was to blame. So we changed that code, and no longer listened for the power connected event at all. This immediately killed off the zombies. Still it left us scratching our heads about the cause of the zombies. So we wrote a little test app. This app would simply count the number of times the power connected event was fired on a device. Then we would plug in a device for awhile and see how many events the app had counted. 

We started testing devices, and we began with Verizon phones since we had seen large numbers of zombies on Verizon’s network. Most phones behaved just as they should, firing the event right when the phone was plugged in and never again until you unplugged it and then plugged back in again. But one Verizon phone we tested did not: the Verizon Fascinate. Once plugged in to charge, this phone would fire the power connected event every ten seconds or so (there actually was quite a bit of variance.) With our old log uploading code, this would cause our background service to wake up every ten seconds, shake hands with our servers, and then sit idle. In other words, it would create a zombie.

If you are familiar with Android phones, you will recognize that the Fascinate is Verizon’s version of the Samsung Galaxy S. The Galaxy S is perhaps  the most popular Android phone in the world. We tested several other Galaxy S phones, and none of them had the zombie gene like the Fascinate. The presence of zombies in Japan and South Korea suggests there are phones being sold there that have the zombie gene, perhaps some other flavor of the Galaxy S. We can’t say which one for sure. We’re just glad that we can once again sleep through the night without being woken up by zombies.

Want to work with Michael and the rest of the Bump team? We are hiring: http://bu.mp/jobs.

iOS vs. Android from the Trenches

This post was written by Indy from Bump's Android team and is the first part of a series about lessons learned while developing Bump 2.0 for Android.

 After months of hard work, we shipped Bump 2.0 for Android at the end of April. It was a full rewrite of the client from the ground up to bring it onto Bump's 2.0 architecture. To make this happen a couple of us at Bump switched to working full time on Android from iOS, and experienced all the joys and pains that come from such an experience. We've teased our former iOS teammates about how certain things are so amazing and easy to do in Android, but the truth is more nuanced. In this post I want to spend some time looking at the similarities and differences of the two platforms from both a technical perspective and a UX perspective. Though it is possible to separate these two, since we are talking about a handheld device, both these perspectives are often intertwined.

For 2.0 we wanted to build an App that felt native to Android and not a shoddy copy of an iPhone app. We ourselves are guilty of this with the previous version of our Android app (Android doesn't have tabs on the bottom!). We went through several iterations of our initial specs for 2.0 to bring it closer to a native Android look and feel.

Aside from the surface level UI differences, I found that Android's architecture is very forward looking in comparison to iOS. Every app gets a chance to deeply integrate with the operating system and other applications. There is a lot that the OS has to offer in this respect and so far we've only scratched the surface of what we can do with this at Bump.

One example of a future looking architecture decision is how different applications can flow back and forth between each other. There are two major parts of the system that make this possible: Intents and the hard back button.

Intents are part of the notification system that is built right into the operating system. Any application can listen to and act upon intents that are broadcasted system wide. For example: when a user clicks on a link inside a chat in Bump, an intent to view the link is broadcasted by the system. The browser is registered to respond to it so the system slides the chat screen away and slides in the browser with the contents of the link. The user now has the power of the full Android browser to interact with the web page. When the user is done viewing this link hitting the back button slides the chat screen back in.

This is a huge UX win! In iOS there is simply no way to make this sort of seamless interaction. Apps have to bake in a UIWebView and then give you an option to open the page in Safari if you wanted get all the browser features. Once there getting back to the previous app is a multi step process. Android is filled with many such integration points that help users flow seamlessly between tasks. As an app developer you can even add new integration points for other apps to interact with yours.

When it comes to layouts, Android has to deal with many different screen sizes and resolutions. It provides many tools to deal with this variation. One such tool is declarative (in xml) layouts that "flow". These layouts are very similar to how layouts flow on the web. Each object in the layout has its position either defined based on an edge of the screen or relative to other objects. This layout methodology allowed us to quickly iterate on the UI by simply moving bits of xml around. Also simple things like a line of prompt text that might become two in more verbose languages (*cough* German) don't break the UI. On screen elements simply shift to adapt to this change.

There is, however, a flip-side to flow layouts. In iOS you can often do pixel perfect placement. In Android with it's multitudes of resolutions and screen sizes you simply don't have that luxury. Thus the same tricks for placing elements in iOS simply don't carry over. However, this is something every designer that has moved from print to the web has had to deal with, and it's not a new problem.

Android also allows an app to run multiple OS level processes. We take advantage of this ability to do almost all of our CPU heavy and networking work in a separate background process. This is great because we don't have to worry about any of this work having much of an effect on UI performance. This is a constant struggle in iOS even if you manage your threads well.

One thing that has been a disappointing with Android though, is that animations and responsiveness of the UI is generally slower. In the parts of our app that we animate, we often see missed frames. Honestly this sucks. There are definitely things that we can do to make this better but in the end this is because none of the graphics and animations are hardware accelerated pre Honeycomb. Unless you specifically write Open GL code (and now Renderscript) you won't get the same level of graphics performance that you see in normal iOS apps. Since currently only tablets run Honeycomb, we are stuck with generally not so stellar graphics performance on pretty much all current handsets.

The list of differences between the two platforms is obviously enormous. There are quite a few purely technical differences, but discussing many of them can easily degrade into religious arguments (languages, dev tools, garbage collection vs. ref counted memory management). In most of these regards both platforms are fairly comparable. The major difference being that Android's Java code runs on a relatively young (dalvik) VM that hasn't figured out all of its optimizations and bugs yet. It's quickly catching up to its older VM cousins though.

Building for Android was definitely quite a change from iOS, but both involve building for devices with very constrained resources. We learned a lot and had some very interesting experiences. In the coming weeks we'll share more of these on this blog, including something we've been asked often about: using Scala in Android.

Want to work with Indy and the rest of the Bump team? We are hiring: http://bu.mp/jobs.

TL;DR.js

Reading all the hacker news articles can take too long.  

Sometimes an author will add a tl;dr (too long; didn't read) section at the bottom for people who just scrolled by the article without reading it.  For those authors who weren't so considerate, here is a script that will do it for you.  You install it by dragging it to your bookmark bar, then when you accidentally click on a really long article, don't worry, you don't have to read it.   Just click the link in your bookmark bar and a nice little overview will show up at the top of the page.  YMMV.

Install by dragging this link to your bookmark bar (in chrome click view->always show bookmarks bar to see it): 

 

If you want to summarize a subset of the article, select it with your mouse then click the bookmarklet.

It works by finding the most common words in the article, and then finding the sentences with the highest density of common words (excluding words like "the", "and", etc.)

Here is the tldr of Daring Fireball:

  • Apple un-magic is interesting
  • “Open” is to Android as “magic” is to iOS
  • iOS aims for it Android doesn’t want it  ★
  • Google hypocrisy is interesting; Apple secrecy, not so much
  • Android feels like an independent Google subsidiary
  •  

    This is a quick hack so please fork on github:

    tl;dr.js on GitHub

    Spend less time reading hacker news, join Bump: http://bu.mp.jobs