All the data can be yours: reverse engineering APIs

(jero.zone)

Comments

danielvaughn 11 November 2024
At a former job, we reverse engineered the trading APIs of most American retail stock brokerages (Fidelity, E-Trade, Robinhood, TD Ameritrade, etc). We did it by rooting an iPhone and using Charles Proxy to grab the unencrypted traffic.

I learned a lot from that experience, and it's also just plain fun to do. We did get some strongly worded letters from Robinhood though, lol. They tried blocking our servers but we just set up this automated system in Digital Ocean that would spin up a new droplet each time we detected a blockage, and they were never able to stop us after that.

Fun times.

stoplight 11 November 2024
This is how I made a better version of the nhl.com site [1] that has a better UI (you can see scores/schedules much more easily), is mobile first, has no ads, and responsiveness built in. I did the same for the AHL [2], and the PWHL [3].

[1] https://nhl-remix.vercel.app/ [2] https://ahl-remix.vercel.app/ [3] https://pwhl-remix.vercel.app/

6510 12 November 2024
I one time talk with a guy running a scraping company. A client of his wanted all products from all of his suppliers in a single interface. One really slow website was constantly changing things specifically to make scraping impossible. He had to fix it every few hours then they changed everything again.

I gave him the perfect solution: Why don't you just call them and ask how much money they want for access to their product catalog? Give them tiny bits of information if they don't approve immediately. Tell them it is for one of their customers and that their site is to slow. They are the only supplier not in the interface which is bad for them and bad for you. If they still refuse offer them access to their competitors data at a modest fee.

He couldn't stop laughing, he never considered it. When the sun set the next morning he put in the call. They gave him access immediately, they were so happy to finally get rid of his crawler messing up their analytics data. Apparently the people on the other end of the tube also didn't sleep. They had a good laugh about it.

rubslopes 11 November 2024
I do exactly this, but for the company that I work for.

I'm on the dashboards and integrations team, and I don't have direct access to the codebase of the main product. As the internal APIs have no documentation at all, I'm always "hacking" our own system using the browser inspector to find out how our endpoints work.

Lucasoato 11 November 2024
> 75grand’s success was even met with jealousy from the college

That's a common story; in my university (in Padova, UniPD) happened something even worse. They tried hard to shut down an unofficial app (Uniweb) that was installed by most of the students in favor of the "official" one, that was completely unusable (and probably was born out of a rigged contract). At the end the best one won and became official, but that was after a lot of struggle.

faizshah 12 November 2024
Anyone come up with good techniques for reverse engineering websockets?

Its especially annoying since many use binary message formats and there isnt a great way to document an arbitrary binary message protocol.

A couple techniques im trying out:

- websocat and wsrepl for reverse engineering interactively: https://github.com/doyensec/wsrepl

- kaitai struct for documenting the binary message formats: https://kaitai.io/

brutecat 12 November 2024
The problem w/ this is a lot of the time is as they keep ramping up their anti-abuse measures, your old code breaks and needs to be rewritten (especially if it's an API you intend to use for a long term project).

It's just an endless cat & mouse game.

saagarjha 12 November 2024
I did the same thing long (well, not that long) ago for my school’s system too: https://github.com/saagarjha/break. In retrospect this was an excellent use of time because when I made this I barely knew what JSON was and ended up reimplementing some protocols from scratch (HTTP basic auth, WebDAV) that I learned much later were actually standardized and I had actually known them all along ;) Alas, the service got bought out by some vulture/bodyshop PE firm and the app ended up outliving the servers it was hitting. But until then I got a lot of great email from high school kids telling me how much I improved their experience or asking if they could build their own thing on the APIs they found in my app after trawling GitHub.
Gamemaster1379 12 November 2024
I actually enjoy reversing APIs. A game I really liked to play announced EOS last year. I ended up capturing a bunch of mitm traffic and built my own API instead and now head up a private server for it.

I don't dare monetize that project, but I wish I could use my skills to make some extra money at reversing APIs. Wouldn't know where to begin though.

gaeb69 11 November 2024
Sick app btw. Funny this comes up because I'm working on the exact same thing for my school. Note that if your school uses Canvas; Canvas' API is well documented and has GraphQL endpoints.
treyd 11 November 2024
I wonder how difficult it would be to combine many of these techniques into some automated script that dumps a manifest of the different types of undocumented APIs there are. LLMs have also been shown to be pretty good at answering semantic questions about large blobs of minified code, perhaps there could be some success there?
z3c0 11 November 2024
> The error messages helpfully suggested fields I hadn’t known about by “correcting” my typos.

Glad to see this being called out. Sure, I get why it's convenient. Misspelling a field by one character is a daily occurrence ("activty" and "heirarchy" are my regulars). The catch is that spellchecking queries and returning valid fields in the error effectively reduces entropy by both character space and message length, varying by the type of distance used in the spellcheck.

ideashower 12 November 2024
Someone did this to the Ring mobile app to get a dataset of all (public) Ring cameras across the US

https://gizmodo.com/ring-s-hidden-data-let-us-map-amazons-sp...

thesurlydev 12 November 2024
As I read through these comments I'm reminded of recon techniques commonly used in bug bounties. Reverse proxies, decompiling APKs, parsing JS files to find endpoints, etc. I often went down rabbit holes in the recon phase and never found bugs but I had fun in the process so I considered it a win anyway.

I also implemented a bot mitigation system for a large international company so I got to see techniques used from the other side. Mobile phone farms and traffic from China was the most difficult to mitigate.

colesantiago 11 November 2024
Most of these techniques are extremely old and very outdated.

Teams that I've seen working on apps now implement much stronger checks on APIs especially Android apps such as SafetyCheck and DeviceCheck and other methods, which makes using strings rather basic to see them.

And most apps are now encrypted so you just see junk in the logs.

xfeeefeee 12 November 2024
It's really fascinating watching the constant push and pull in projects like yt-dlp and places like YouTube and TikTok. So many interesting techniques to make things more difficult for reverse engineering or making unofficial requests. There is even a tiny JavaScript engine iirc that calculates a special value that YouTube uses to verify requests.
miki123211 12 November 2024
I think good APIs are one of te most important and least obvious advantages of SPAs.
raudette 12 November 2024
A technique not covered here is setting up a wifi hotspot to impersonate a device. I wanted to reverse engineer the protocol used by an inexpensive action camera. I ran the Android app through a decompiler (JADX), and the app supported many different types of cameras, so it wasn't clear which calls were related to the one I owned.

So I setup a Raspberry Pi with the SSID as the camera, logged the first API call. Then I ran that API against the camera, and then wrote a small python script on the Pi to return the same API result. I did that call by call until I'd worked everything out. I wrote it up here: https://www.hotelexistence.ca/reverse-engineer-akaso-ek7000/

nicbou 12 November 2024
bund.dev is a group of volunteers doing exactly that for APIs run by the German government. I met Lilith at a meetup recently. Her talk was super interesting.

https://bund.dev/

https://links.lilithwittmann.de/

sccxy 11 November 2024
> Mobile apps have no choice but to use HTTP APIs. You can easily download a lot of iOS apps through the Mac App Store, then run strings on their bundles to look for endpoints.

Are there any good tutorials for that? 'strings' is not the greatest name for searching good information.

sodality2 12 November 2024
Did this with Redlib to allow access to Reddit, and have been building tooling to allow academic researchers to access data at a large scale at no cost :)

https://github.com/redlib-org/redlib

zzo38computer 13 November 2024
I had also sometimes done, but usually because the existing UI isn't very good (and sometimes it doesn't work at all).
rcpt 11 November 2024
Anyone have this for Twitter? I want to remove most of my tweets but the official API costs $200
purple-leafy 12 November 2024
I’ve built the leading satay transparency tool in my country and one other, reverse engineering an API. Took me a few hours, now 6000 people use the tool. I think my tool is responsible for their changes to rate limiting lol
klabetron 12 November 2024
Macalester ‘07 here. Back in my day their LDAP directory was public (at least on the campus network) which I used to scrape student & professor lists.
dchuk 11 November 2024
I’ve reverse engineered a few industry conference apps to more easily get the list of attendees (some conferences will literally only give you a pdf of scanned paper lists of contact info for attendees, which is insane, especially if you are paying to have a booth there). I’ve either decompiled the Android app, or ran mitmproxy on the device, or both, to figure it out. Does anyone have a recommendation for having a pre-rooted Android simulator with the tools you need installed and ready to go to make this a quicker process? I’d love to just drag and drop an apk into a simulator and start inspecting vs having to use a real device and all that jazz.
nurettin 13 hours ago
Browser debugger is my #1 resource. There is almost always a way to get stuff by replaying network requests. No matter how much they obfuscate (I'm looking at you, quora) it is still possible to replicate the requests and use the site like an API. Some real estate businesses and shadier data providers use browser detection to get around automation, but it is usually a minor inconvenience since you can run browsers headless or on xvfb if they have to render.
speakspokespok 12 November 2024
Can someone recommend a long form guide that covers best practices for handling authentication and the like?
geniium 12 November 2024
Very interesting read! Gives some idea on how to integrate data that are not easy to access. Thanks
lyime 12 November 2024
If you want to do this and get paid for it come talk to us at Terminal49
anothername12 12 November 2024
There’s probably a group of poor bastards on call at 5am, poring over alerts, and logs in Datadog etc wondering WTF is wrong with their applications.

Give ‘em a shout out by saying “hi” in a request parameter or something while you’re reverse engineering.

tomblomfield 12 November 2024
W
matthewfcarlson 11 November 2024
I went to the 75grand app listed in the article and saw a listing for Cafe Mac and did a double take. Apple's employee cafe is caffe Macs, so I was quite confused for a second
Eikon 11 November 2024
This approach is generally seen as unwanted by website owners (it's worth noting that automated API clients are distinct from regular user agents). As a “reverse engineer”, you have no idea how expensive or not an endpoint is to process a request.

Instead, I'd recommend reaching out to the website owners directly to discuss your API needs - they're often interested in hearing about potential integrations and use cases.

If you don't receive a response, proceeding with unauthorized API usage is mostly abusive and poor internet citizenship.