Technology
14 minute read

A Tutorial for Reverse Engineering Your Software's Private API: Hacking Your Couch

Nikolay (MSc) started his career with a Google internship, worked full-stack, built iOS apps, and now loves to help startups launch MVPs.

Traveling is my passion, and I’m a huge fan of Couchsurfing. Couchsurfing is a global community of travelers, where you can find a place to stay or share your own home with other travelers. On top of that, Couchsurfing helps you enjoy a genuine traveling experience while interacting with locals. I’ve been involved with the Couchsurfing community for over 3 years. I attended meetups at first, and then I was finally able to host people. What an amazing journey it was! I’ve met so many incredible people from all over the world and made lots of friends. This whole experience truly changed my life.

I’ve hosted a lot of travelers myself, much more than I’ve actually surfed yet. While living in one of the major touristic destinations on the French Riviera, I received an enormous amount of couch requests (up to 10 a day during high season). As a freelance back-end developer, I immediately noticed that the problem with the couchsurfing.com website is that it doesn’t really handle such “high-load” cases properly. There is no information about the availability of your couch - when you receive a new couch request you can’t be sure if you are already hosting someone at that time. There should be a visual representation of your accepted and pending requests, so you can manage them better. Also, if you could make your couch availability public, you could avoid unnecessary couch requests. To better understand what I have in mind take a look at Airbnb calendar.

Lots of companies are notorious for not listening to their users. Knowing the history of Couchsurfing, I couldn’t count on them to implement this feature anytime soon. Ever since the website became a for-profit company, the community deteriorated. To better understand what I’m talking about, I suggest reading these two articles:

I knew that lot of community members would be happy to have this functionality. So, I decided to make an app to solve this problem. It turns out there is no public Couchsurfing API available. Here is the response I’ve received from their support team:

“Unfortunately we have to inform you that our API is not actually public and there are no plans at the moment to make it public.”

Breaking Into My Couch

It was time to use some of my favorite software reverse engineering techniques to break into Couchsurfing.com. I assumed that their mobile apps must use some sort of API to query the backend. So, I had to intercept the HTTP requests coming from a mobile app to the backend. For that purpose I set up a proxy in the local network, and connected my iPhone to it to intercept HTTP requests. This way, I was able to find access points of their private API and figure out their JSON payload format.

Finally I created a website which serves the purpose of helping people manage their couch requests, and show surfers a couch availability calendar. I published a link to it on the community forums (which are also quite segmented in my opinion, and it’s difficult to find information there). The reception was mostly positive, although some people didn’t like the idea that the website required couchsurfing.com credentials, which was a matter of trust really.

The website worked like this: you log in to the website with your couchsurfing.com credentials, and after a few clicks you get the html code which you can embed into your couchsurfing.com profile, and voila - you have an automatically updated calendar in your profile. Below is the screenshot of the calendar and here the articles on how I made it:

Example calendar

I’ve created a great feature for Couchsurfing, and I naturally assumed that they would appreciate my work - perhaps even offer me a position in their development team. I’ve sent an email to jobs(at)couchsurfing.com with a link to the website, my resume, and a reference. A thank-you note left by one of my couchsurfing guests:

Thank you note.

A few days later they followed up on my reverse engineering efforts. In the reply it was clear that the only thing they were concerned about was their own security, so they asked me to take down the blog posts I’ve written about the API, and eventually the website. I’ve taken down the posts immediately, as my intention was not to violate the terms of use and fish for user credentials, but rather to help the couchsurfing community. I had an impression that I was treated as a criminal, and the company focused solely the fact that my website requires user credentials.

I proposed to give them my app for free. They could host it on their environment and connect it through Facebook authentication. After all, it is a great feature, and the community needed it. Here is the final resolution I received:

“We are getting back into the swing of things here after the holidays and wanted to follow up.

We have had some internal discussion about your application and how we could both honor the creativity and initiative it shows while not potentially compromising the privacy and security of Couchsurfing users’ data when they enter their credentials into a third-party site.

The calendar clearly fills a feature hole on our site, a feature that is part of a larger project that we are working on now.

But the issue of collecting usernames and passwords remains. We couldn’t come up with an easy way to set it up so that we could host or support that on our side without either allowing you to access that data or have your site be seen as our work product.

The API that is currently available is soon to be replaced with a version that will require authentication/authorization from applications that access it.”

Today while I’m writing this reverse engineering software tutorial (a year after the events), the calendar feature is still not implemented on Couchsurfing.

Return To Innocence - Hacking My Couch, Again

A few weeks ago I was inspired to write an article about the techniques to reverse engineer private APIs. Naturally, I decided to summarise the previous articles I’ve written on this topic, and add few more details. As I started to write the new article I wanted to showcase the reverse engineering process with an up-to-date API and take another stub into API hacking. Based on my previous experience, and the fact that Couchsurfing recently announced a completely new wesbite and mobile app http://blog.couchsurfing.com/the-future-of-couchsurfing-is-on-the-way/, I’ve decided to hack their API again.

Why am I doing this reverse engineering process? Well, first of all it’s a lot of fun to reverse engineer software in general. What I particularly like about it, is that it doesn’t involve just your technical skill, but also your intuition. Sometimes, the best way to figure things out is to make an educated guess - it’ll save you lots of time compared to brute-force. Recently I heard a story from a company which had to work with proprietary APIs and little or no documentation. They’d been struggling to decrypt the API response payload in an unknown format for days, then someone decided to try ?decode=true at the end of the url and they had a proper JSON. Sometimes, if you are lucky, all you need to do is prettify the JSON response.

Another reason I’m doing this tutorial is that it takes ages for some companies to adopt a particular feature requested by their users. Rather than waiting for it to be implemented, you can harness the power of their private API and build it yourself.

So, with the new couchsurfing.com API, I started with a similar approach and I installed their latest iOS app.

First, you need to set up a proxy in your LAN to forge HTTP requests coming from the app to the API by performing a man-in-the-middle attack (MITM).

For unencrypted connections the attack is quite simple - a client connects to the proxy and you relay incoming requests to the destination server back and forth. You could possibly modify the payload, if necessary. In a public WLAN, it’s fairly easy to perform this under disguise by impersonating the WiFi router.

For encrypted connections, there is a minor difference: all the requests are encrypted end-to-end. it’s not possible for the attacker to decrypt the message, unless he somehow gets access to the private key (which of course is not sent during these interactions). Having said that, even though the API communication channel is secure, the endpoints - especially the client - are not that safe.

The following conditions have to be met in order for SSL to work properly:

  • The server’s certificate has to be signed with a trusted certificate authority (CA)
  • The server’s common name, in the certificate, must match the domain name of the server

To overcome the encryption in a MITM attack, our Proxy need to act as a CA (Certificate Authority) and generate certificates on the fly. For example if a client tries to connect to www.google.com the proxy dynamically creates a certificate for www.google.com and signs it. Now, the client thinks that the proxy is in fact www.google.com

This diagram outlines the steps to reverse engineer a private API.

To implement a sniffing proxy used to reverse engineer the private API, I’ll use tool called mitmproxy. You can use any other transparent HTTPS proxy. Charles is another example with a nice GUI. To make this work we need to set up the following things:

Configure your phone’s WiFi connection default gateway to be the proxy (so that the proxy is in middle and all the packets pass through) Install proxy’s certificate on the phone (so that the client has the proxy’s public key in its trust store)

Check your proxy’s documentation about installing the certificate. Here are the instructions for mitmproxy. And here is the certificate PEM file for iOS.

To monitor intercepted HTTP requests, you simply launch mitmproxy and connect to it from your mobile phone (default port is 8080).

Mobile phone settings.

Open a website in your mobile browser. At this point you should be able to see the traffic in mitmproxy.

Once you have confirmed everything is working, the reverse software engineering can begin.

Once you make sure that everything works as planned, it’s time to get started exploring the private API of your choice. Basically, at this point you can just open the app, play around with it and get an idea about API endpoints and request structure.

There is no strict algorithm on how to reverse engineer a software API - most of the time you rely on your intuition and make assumptions.

My approach is to replicate the API calls and play with different options. A good start is to replay a request you caught in mitmproxy, and see if it works (Press ‘r’ to replay a request). The first step is to figure out which headers are obligatory. It’s pretty convenient to play with headers with mitmproxy: press ‘e’ to enter editing mode, then ‘h’ to modify headers. With the shortcuts they use, vim addicts would feel right at home. You can also use browser extensions like Postman to test the API, but they tend to add unnecessary headers, so I suggest sticking to mitmproxy or curl.

I’ve made a script that reads mitmproxy dump file and generates a curl string - https://gist.github.com/nderkach/bdb31b04fb1e69fa5346

Let’s start with the request sent when you are logging in.

POST https://hapi.couchsurfing.com/api/v2/sessions
← 200 application/json

The first step in this reverse engineering tutorial is to replicate the API calls and play with the resulting options.

The first thing I noticed is that every request contains a mandatory header X-CS-Url-Signature which is different every time. I also tried to replay a request after a while to check if there is a timestamp check on the server, and there is none. Next thing to do is to figure out how this signature is calculated.

At this point I decided to reverse-engineer the binary and figure out the algorithm. Naturally, having experience developing for iPhone and having an iPhone at my disposal, I decided to start with the iPhone ipa (iPhone app deliverable). Turns out to decrypt one, I need a jailbroken phone. Stop! Hammer Time.

Then, I recalled that they have an Android app as well. I was a bit hesitant to try this approach, as I don’t know anything about Android or Java. I then thought it would be a good chance to learn something new. It turned out to be easier to get a human-readable quasi source code by decompiling java bytecode than heavily optimized iphone machine code.

Apk (Android app deliverable) is basically a zip file. You can use any zip extractor to unpack its contents. You will find a file called classes.dex, which is a Dalvik bytecode. Dalvik is a virtual machine used to run translated Java bytecode on Android.

To decompile the .dex file into .java source code I used the tool called dex2jar. The output of this tool is a jar file, which you can decompile with a variety of tools. You can even open a jar in Eclipse or IntelliJ IDEA and it will do all the work for you. Most of these tools produce a similar result. We don’t really care if we can compile it back to run it, we are merely using it to analyze the source code.

Here is a list of tools I’ve tried:

CFR and FernFlower worked the best for me. JD-GUI was unable to decompile some critical parts of the code and was useless, while the others were about the same in quality. Luckily, it seems that the Java code code hasn’t been obfuscated, but there are tools like ProGuard http://developer.android.com/tools/help/proguard.html to help you deobfuscate the code.

Java decompilation is not really the scope of this reverse engineering tutorial - there is a lot written on this topic, so let’s assume you successfully decompiled and deobfuscated your Java code.

I’ve combined all the relevant code used to calculate X-CS-Url-Signature in the following gist: https://gist.github.com/nderkach/d11540e9af322f1c1c74

Firstly, I’ve searched for mentions of X-CS-Url-Signature, which I’ve found in RetrofitHttpClient. One particular call seemed interesting - to EncUtils module. Digging into it, I realized that they are using HMAC SHA1. HMAC is a message authentication code which uses a cryptographic function (SHA1 in this case) to compute a hash of a message. It’s used to ensure integrity (i.e. to prevent a man in the middle from modifying the request) and authentication.

We need two things to calculate the X-CS-Url-Signature: the private key and the encoded message (probably some variation of HTTP request payload and URL).

final String a2 = EncUtils.a(EncUtils.a(a, s));
 
final ArrayList<Header> list = new ArrayList<Header>(request.getHeaders());
list.add(new Header("X-CS-Url-Signature", a2));

In the code a is a message and s is the key which are used to compute the header a2 (the double call to EncUtils just computes an HMAC SHA1 hex digest).

Finding the key wasn’t a problem - it was stored in plain text in ApiModule, and was used to initialize the second parameter of RetrofitHttpClient.

RetrofitHttpClient a(OkHttpClient okHttpClient) {
    return new RetrofitHttpClient(okHttpClient, "v3#[email protected]#XreXeGCh");
}

If we look at the call to EncUtils, we can see that the string literal above is used verbatim as a key to calculate the HMAC, except in the case when this.b is defined. In the latter case, this.b is being appended with a dot to it.

String s;
if (this.b == null) {
    s = this.a;
}
else {
    s = this.a + "." + this.b;
}

Now, just by looking at the code it wasn’t clear to me where and how this.b is initialized (the only thing I was able to discover is that it’s called in a method with a signature this.a(String b), but I couldn’t find a call to it anywhere in the code).

public void a(final String b) {
    this.b = b;
}

I encourage you to decompile it and find out yourself :)

Figuring out the message was pretty straight forward - in the code you can see that’s it’s a concatenation of the url path, i.e. /api/v2/sessions and a string with JSON payload (if any).

final byte[] b = this.b(request.getUrl());
byte[] a;
if (request.getBody() != null && request.getBody() instanceof JsonTypedOutput) {
    System.out.println("body");
    // this.a(x, y) concatenates byte arrays  
    a = this.a(b, ((JsonTypedOutput)request.getBody()).a);
}
else {
    a = b;
}

Just by looking at the code, it was difficult to figure out the exact algorithm for HMAC calculation. So, I decided to rebuild the app with debugging symbols to figure out exactly how the app works. I’ve used a tool called apktool https://code.google.com/p/android-apktool/ to dissasemble the Dalvik bytecode using smali https://code.google.com/p/smali/. I followed the guide at https://code.google.com/p/android-apktool/wiki/SmaliDebugging

After you build the apk you need to sign and install it on your device. As I didn’t have an Android device, I used the emulator which comes with Android SDK. With some spoonfeeding, here is how you do it:

jarsigner -verbose -keystore ~/.android/debug.keystore -storepass android -keypass android <path_to_your_built_apk> androiddebugkey

jarsigner -verify -verbose -certs <path_to_your_built_apk>

zipalign -v 4 <path_to_your_built_apk> <path_to_your_output_signed_apk>

I used a built-in Android emulator which comes with the sdk and an Atom x86 virtual image with HAXM enabled to ensure it runs smoothly.

tools/emulator -avd mydroid -no-boot-anim -cpu-delay 0

Here is a nice guide on how to set up a virtual image: http://jolicode.com/blog/speed-up-your-android-emulator

Make sure you see the line HAX is working and emulator runs in fast virt mode on emulator startup to make sure you have HAXM enabled.

Then, I installed the apk into the emulator and ran the app. Following the apktool guide, I leveraged IntelliJ IDEA remote debugger to connect to the emulator and set some line breakpoints:

Some reverse engineering techniques involve running the app and just seeing what happens.

Playing with the app for a little bit, I was able to figure out that the private key used to initialize RetrofitHttpClient is used for calculating HMAC of a login request signature. In the response to the login POST you receive a user ID and accessToken (X-Access-Token). The access token is used to authorize all the following requests. The HMAC for all the post login requests is constructed the same way as the login request, except that the key is composed by appending .<user_id> to the original private key.

This shows the authorization process necessary to reverse engineer this private API.

Once you are authorized, the app sends the following request:

POST https://hapi.couchsurfing.com/api/v2/users/1003669205/registerDevice
← 200 application/json

As I was able to empirically deduct, this request is optional for authentication. Bonus points if you figure out what it’s used for!

Once authenticated you can send a request to fetch your (or anyone else’s user profile), like this:

GET https://hapi.couchsurfing.com/api/v2/users/1003669205
← 200 application/json

In this reverse engineering process you can fetch anyone’s user profile.

I didn’t go much into details, but I noticed that a profile is updated with a PUT request. Just for fun, I tried to update another profile with the same request - it wasn’t authorized, so apparently the security basics are implemented.

I wrote a simple Python script to login using your couchsurfing.com credentials and get your user profile: https://gist.github.com/nderkach/899281d7e6dd0d497533. Here is the Python wrapper for the API: https://github.com/nderkach/couchsurfing-python with a package available in pypi repository (pip install couchsurfing).

Next Steps

I’m not sure what exactly I’m going to do with the API this time. HTML code in user profiles is no longer allowed, so I’ll have to come up with a different approach to the old problem. I’ll continue developing and enhancing the python API wrapper, if there is a demand for it, and assuming that couchsurfing.com will not cause too many problems. I didn’t explore the API too much, and I just tested it for some basic vulnerabilities. It seems secure enough, but it would be interesting to find out if you can get access to the data which is not available through the website. Either way, now you can use my reverse software engineering to build an alternative client for Windows Phone, Pebble, or your smart-couch.

Wrap-up With A Question

There is a discussion I’d like to open - why not publish your API and make it public? Even if I didn’t manage to hack the API, it would still be possible to scrape the website. It would be slower and more difficult to maintain, but surely they’d prefer consumers to use an API rather than a web scraper. Availability of the APIs would allow third-party developers to improve the company’s product, and build value-added service around it. One can make an argument that it would be more expensive to maintain the public API rather than a private one; but then again, the advantages of your community building services on top of your product would outweigh the API maintenance costs.

Is it possible to completely prevent usage of a private API by third-party clients? I don’t think so. Using SSL pinning would prevent sniffing into API requests using a simple transparent proxy technique as described earlier. In the end, even if you obfuscate the binary, a motivated hacker with some resources and time will always be able to reverse-engineer the app binary and get the private key/certificate. I think the assumption that the client endpoint is secure is inherently wrong. An API client is a weak spot.

By keeping an API private, a company is basically conveying a message of mistrust to their users. Surely, you can try to protect your private API even further. However, wouldn’t you rather implement a basic security for the API to prevent malicious usage; and instead focus your resources on improving the software to provide a better user experience?

Couchsurfing, pretty please, with sugar on top, open the API.