At a former job, we reverse engineered the trading APIs of most American retail stock brokerages (Fidelity, E-Trade, Robinhood, TD Ameritrade, etc). We did it by rooting an iPhone and using Charles Proxy to grab the unencrypted traffic.
I learned a lot from that experience, and it's also just plain fun to do. We did get some strongly worded letters from Robinhood though, lol. They tried blocking our servers but we just set up this automated system in Digital Ocean that would spin up a new droplet each time we detected a blockage, and they were never able to stop us after that.
This is how I made a better version of the nhl.com site [1] that has a better UI (you can see scores/schedules much more easily), is mobile first, has no ads, and responsiveness built in. I did the same for the AHL [2], and the PWHL [3].
I one time talk with a guy running a scraping company. A client of his wanted all products from all of his suppliers in a single interface. One really slow website was constantly changing things specifically to make scraping impossible. He had to fix it every few hours then they changed everything again.
I gave him the perfect solution: Why don't you just call them and ask how much money they want for access to their product catalog? Give them tiny bits of information if they don't approve immediately. Tell them it is for one of their customers and that their site is to slow. They are the only supplier not in the interface which is bad for them and bad for you. If they still refuse offer them access to their competitors data at a modest fee.
He couldn't stop laughing, he never considered it. When the sun set the next morning he put in the call. They gave him access immediately, they were so happy to finally get rid of his crawler messing up their analytics data. Apparently the people on the other end of the tube also didn't sleep. They had a good laugh about it.
I do exactly this, but for the company that I work for.
I'm on the dashboards and integrations team, and I don't have direct access to the codebase of the main product. As the internal APIs have no documentation at all, I'm always "hacking" our own system using the browser inspector to find out how our endpoints work.
> 75grand’s success was even met with jealousy from the college
That's a common story; in my university (in Padova, UniPD) happened something even worse. They tried hard to shut down an unofficial app (Uniweb) that was installed by most of the students in favor of the "official" one, that was completely unusable (and probably was born out of a rigged contract). At the end the best one won and became official, but that was after a lot of struggle.
The problem w/ this is a lot of the time is as they keep ramping up their anti-abuse measures, your old code breaks and needs to be rewritten (especially if it's an API you intend to use for a long term project).
I did the same thing long (well, not that long) ago for my school’s system too: https://github.com/saagarjha/break. In retrospect this was an excellent use of time because when I made this I barely knew what JSON was and ended up reimplementing some protocols from scratch (HTTP basic auth, WebDAV) that I learned much later were actually standardized and I had actually known them all along ;) Alas, the service got bought out by some vulture/bodyshop PE firm and the app ended up outliving the servers it was hitting. But until then I got a lot of great email from high school kids telling me how much I improved their experience or asking if they could build their own thing on the APIs they found in my app after trawling GitHub.
I actually enjoy reversing APIs. A game I really liked to play announced EOS last year. I ended up capturing a bunch of mitm traffic and built my own API instead and now head up a private server for it.
I don't dare monetize that project, but I wish I could use my skills to make some extra money at reversing APIs. Wouldn't know where to begin though.
Sick app btw. Funny this comes up because I'm working on the exact same thing for my school.
Note that if your school uses Canvas; Canvas' API is well documented and has GraphQL endpoints.
I wonder how difficult it would be to combine many of these techniques into some automated script that dumps a manifest of the different types of undocumented APIs there are. LLMs have also been shown to be pretty good at answering semantic questions about large blobs of minified code, perhaps there could be some success there?
> The error messages helpfully suggested fields I hadn’t known about by “correcting” my typos.
Glad to see this being called out. Sure, I get why it's convenient. Misspelling a field by one character is a daily occurrence ("activty" and "heirarchy" are my regulars). The catch is that spellchecking queries and returning valid fields in the error effectively reduces entropy by both character space and message length, varying by the type of distance used in the spellcheck.
As I read through these comments I'm reminded of recon techniques commonly used in bug bounties. Reverse proxies, decompiling APKs, parsing JS files to find endpoints, etc. I often went down rabbit holes in the recon phase and never found bugs but I had fun in the process so I considered it a win anyway.
I also implemented a bot mitigation system for a large international company so I got to see techniques used from the other side. Mobile phone farms and traffic from China was the most difficult to mitigate.
Most of these techniques are extremely old and very outdated.
Teams that I've seen working on apps now implement much stronger checks on APIs especially Android apps such as SafetyCheck and DeviceCheck and other methods, which makes using strings rather basic to see them.
And most apps are now encrypted so you just see junk in the logs.
It's really fascinating watching the constant push and pull in projects like yt-dlp and places like YouTube and TikTok. So many interesting techniques to make things more difficult for reverse engineering or making unofficial requests. There is even a tiny JavaScript engine iirc that calculates a special value that YouTube uses to verify requests.
A technique not covered here is setting up a wifi hotspot to impersonate a device. I wanted to reverse engineer the protocol used by an inexpensive action camera. I ran the Android app through a decompiler (JADX), and the app supported many different types of cameras, so it wasn't clear which calls were related to the one I owned.
So I setup a Raspberry Pi with the SSID as the camera, logged the first API call. Then I ran that API against the camera, and then wrote a small python script on the Pi to return the same API result. I did that call by call until I'd worked everything out. I wrote it up here: https://www.hotelexistence.ca/reverse-engineer-akaso-ek7000/
bund.dev is a group of volunteers doing exactly that for APIs run by the German government. I met Lilith at a meetup recently. Her talk was super interesting.
> Mobile apps have no choice but to use HTTP APIs. You can easily download a lot of iOS apps through the Mac App Store, then run strings on their bundles to look for endpoints.
Are there any good tutorials for that? 'strings' is not the greatest name for searching good information.
Did this with Redlib to allow access to Reddit, and have been building tooling to allow academic researchers to access data at a large scale at no cost :)
I’ve built the leading satay transparency tool in my country and one other, reverse engineering an API. Took me a few hours, now 6000 people use the tool. I think my tool is responsible for their changes to rate limiting lol
I’ve reverse engineered a few industry conference apps to more easily get the list of attendees (some conferences will literally only give you a pdf of scanned paper lists of contact info for attendees, which is insane, especially if you are paying to have a booth there).
I’ve either decompiled the Android app, or ran mitmproxy on the device, or both, to figure it out.
Does anyone have a recommendation for having a pre-rooted Android simulator with the tools you need installed and ready to go to make this a quicker process? I’d love to just drag and drop an apk into a simulator and start inspecting vs having to use a real device and all that jazz.
Browser debugger is my #1 resource. There is almost always a way to get stuff by replaying network requests. No matter how much they obfuscate (I'm looking at you, quora) it is still possible to replicate the requests and use the site like an API. Some real estate businesses and shadier data providers use browser detection to get around automation, but it is usually a minor inconvenience since you can run browsers headless or on xvfb if they have to render.
I went to the 75grand app listed in the article and saw a listing for Cafe Mac and did a double take. Apple's employee cafe is caffe Macs, so I was quite confused for a second
This approach is generally seen as unwanted by website owners (it's worth noting that automated API clients are distinct from regular user agents). As a “reverse engineer”, you have no idea how expensive or not an endpoint is to process a request.
Instead, I'd recommend reaching out to the website owners directly to discuss your API needs - they're often interested in hearing about potential integrations and use cases.
If you don't receive a response, proceeding with unauthorized API usage is mostly abusive and poor internet citizenship.
All the data can be yours: reverse engineering APIs
(jero.zone)586 points by noleary 6 November 2024 | 178 comments
Comments
I learned a lot from that experience, and it's also just plain fun to do. We did get some strongly worded letters from Robinhood though, lol. They tried blocking our servers but we just set up this automated system in Digital Ocean that would spin up a new droplet each time we detected a blockage, and they were never able to stop us after that.
Fun times.
[1] https://nhl-remix.vercel.app/ [2] https://ahl-remix.vercel.app/ [3] https://pwhl-remix.vercel.app/
I gave him the perfect solution: Why don't you just call them and ask how much money they want for access to their product catalog? Give them tiny bits of information if they don't approve immediately. Tell them it is for one of their customers and that their site is to slow. They are the only supplier not in the interface which is bad for them and bad for you. If they still refuse offer them access to their competitors data at a modest fee.
He couldn't stop laughing, he never considered it. When the sun set the next morning he put in the call. They gave him access immediately, they were so happy to finally get rid of his crawler messing up their analytics data. Apparently the people on the other end of the tube also didn't sleep. They had a good laugh about it.
I'm on the dashboards and integrations team, and I don't have direct access to the codebase of the main product. As the internal APIs have no documentation at all, I'm always "hacking" our own system using the browser inspector to find out how our endpoints work.
That's a common story; in my university (in Padova, UniPD) happened something even worse. They tried hard to shut down an unofficial app (Uniweb) that was installed by most of the students in favor of the "official" one, that was completely unusable (and probably was born out of a rigged contract). At the end the best one won and became official, but that was after a lot of struggle.
Its especially annoying since many use binary message formats and there isnt a great way to document an arbitrary binary message protocol.
A couple techniques im trying out:
- websocat and wsrepl for reverse engineering interactively: https://github.com/doyensec/wsrepl
- kaitai struct for documenting the binary message formats: https://kaitai.io/
It's just an endless cat & mouse game.
I don't dare monetize that project, but I wish I could use my skills to make some extra money at reversing APIs. Wouldn't know where to begin though.
Glad to see this being called out. Sure, I get why it's convenient. Misspelling a field by one character is a daily occurrence ("activty" and "heirarchy" are my regulars). The catch is that spellchecking queries and returning valid fields in the error effectively reduces entropy by both character space and message length, varying by the type of distance used in the spellcheck.
https://gizmodo.com/ring-s-hidden-data-let-us-map-amazons-sp...
I also implemented a bot mitigation system for a large international company so I got to see techniques used from the other side. Mobile phone farms and traffic from China was the most difficult to mitigate.
Teams that I've seen working on apps now implement much stronger checks on APIs especially Android apps such as SafetyCheck and DeviceCheck and other methods, which makes using strings rather basic to see them.
And most apps are now encrypted so you just see junk in the logs.
So I setup a Raspberry Pi with the SSID as the camera, logged the first API call. Then I ran that API against the camera, and then wrote a small python script on the Pi to return the same API result. I did that call by call until I'd worked everything out. I wrote it up here: https://www.hotelexistence.ca/reverse-engineer-akaso-ek7000/
https://bund.dev/
https://links.lilithwittmann.de/
Are there any good tutorials for that? 'strings' is not the greatest name for searching good information.
https://github.com/redlib-org/redlib
Give ‘em a shout out by saying “hi” in a request parameter or something while you’re reverse engineering.
Instead, I'd recommend reaching out to the website owners directly to discuss your API needs - they're often interested in hearing about potential integrations and use cases.
If you don't receive a response, proceeding with unauthorized API usage is mostly abusive and poor internet citizenship.