Show HN: I built an AI that turns GitHub codebases into easy tutorials

(github.com)

Comments

bilalq 20 April 2025
This is actually really cool. I just tried it out using an AI studio API key and was pretty impressed. One issue I noticed was that the output was a little too much "for dummies". Spending paragraphs to explain what an API is through restaurant analogies is a little unnecessary. And then followed up with more paragraphs on what GraphQL is. Every chapter seems to suffer from this. The generated documentation seems more suited for a slightly technical PM moreso than a software engineer. This can probably be mitigated by refining the prompt.

The prompt would also maybe be better if it encouraged variety in diagrams. For somethings, a flow chart would fit better than a sequence diagram (e.g., a durable state machine workflow written using AWS Step Functions).

swashbuck1r 21 April 2025
While the doc generator is a useful example app, the really interesting part is how you used Cursor to start a PocketFlow design doc for you, then you fine-tuned the details of the design doc to describe the PocketFlow execution graph and utilities you wanted the design of the doc-generator to follow…and then you used used Cursor to generate all the code for the doc-generator application.

This really shows off that the simple node graph, shared storage and utilities patterns you have defined in your PocketFlow framework are useful for helping the AI translate your documented design into (mostly) working code.

Impressive project!

See design doc https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/bl...

And video https://m.youtube.com/watch?v=AFY67zOpbSo

mooreds 20 April 2025
I had not used gemini before, so spent a fair bit of time yak shaving to get access to the right APIs and set up my Google project. (I have an OpenAPI key but it wasn't clear how to use that service.)

I changed it to use this line:

   api_key=os.getenv("GEMINI_API_KEY", "your-api_key")
instead of the default project/location option.

and I changed it to use a different model:

    model = os.getenv("GEMINI_MODEL", "gemini-2.5-pro-preview-03-25")
I used the preview model because I got rate limited and the error message suggested it.

I used this on a few projects from my employer:

- https://github.com/prime-framework/prime-mvc a largish open source MVC java framework my company uses. I'm not overly familiar with this, though I've read a lot of code written in this framework.

- https://github.com/FusionAuth/fusionauth-quickstart-ruby-on-... a smaller example application I reviewed and am quite familiar with.

- https://github.com/fusionauth/fusionauth-jwt a JWT java library that I've used but not contributed to.

Overall thoughts:

Lots of exclamation points.

Thorough overview, including of some things that were not application specific (rails routing).

Great analogies. Seems to lean on them pretty heavily.

Didn't see any inaccuracies in the tutorials I reviewed.

Pretty amazing overall!

manofmanysmiles 20 April 2025
I love it! I effectively achieve similar results by asking Cursor lots of questions!

Like at least one other person in the comments mentioned, I would like a slightly different tone.

Perhaps good feature would be a "style template", that can be chosen to match your preferred writing style.

I may submit a PR though not if it takes a lot of time.

TheTaytay 20 April 2025
Woah, this is really neat. My first step for many new libraries is to clone the repo, launch Claude code, and ask it to write good documentation for me. This would save a lot of steps for me!
Too 21 April 2025
How well does this work on unknown code bases?

The tutorial on requests looks uncanny for being generated with no prior context. The use cases and examples it gives are too specific. It is making up terminology, for concepts that are not mentioned once in the repository, like "functional api" and "hooks checkpoints". There must be thousands of tutorials on requests online that every AI was already trained on. How do we know that it is not using them?

fforflo 20 April 2025
If you want to use Ollama to run local models, here’s a simple example:

from ollama import chat, ChatResponse

def call_llm(prompt, use_cache: bool = True, model="phi4") -> str: response: ChatResponse = chat( model=model, messages=[{ 'role': 'user', 'content': prompt, }] ) return response.message.content

chairhairair 19 April 2025
A company (mutable ai) was acquired by Google last year for essentially doing this but outputting a wiki instead of a tutorial.
gregpr07 20 April 2025
I built browser use. Dayum, the results for our lib are really impressive, you didn’t touch outputs at all? One problem we have is maintaining the docs with current codebase (code examples break sometimes). Wonder if I could use parts of Pocket to help with that.
remoquete 21 April 2025
This is nice and fun for getting some fast indications on an unknown codebase, but, as others said here and elsewhere, it doesn't replace human-made documentation.

https://passo.uno/whats-wrong-ai-generated-docs/

esjeon 20 April 2025
At the top are some neat high-level stuffs, but, below that, it quickly turns into code-written-in-human-language.

I think it should be possible to extract some more useful usage patterns by poking into related unit tests. How to use should be what matters to most tutorial readers.

iamsaitam 25 April 2025
BLOATED. This project is 100 lines of code, but everything that is non-code related is bloated like a gas giant. All the text and videos are written by an LLM. The author would learn from understanding that QUANTITY isn't QUALITY, toning down the verbiage would benefit greatly what they are trying to communicate.

PS: The generated "design documents" are 2k+ lines long. This seems like a great way to exceed quotas.

mattfrommars 20 April 2025
WTF

You built in in one afternoon? I need to figure out these mythical abilities.

I've thought about this idea few weeks back but could not figure out how to implement it.

Amazing job OP

fforflo 20 April 2025
With $GEMINI_MODE=gemini-2.0-flash I also got some decent results for libraries like simonw/llm and pgcli.

You can tell that because simonw writes quite heavily-documented code an the logic is pretty straightforward, it helps the model a lot!

https://github.com/Florents-Tselai/Tutorial-Codebase-Knowled...

https://github.com/Florents-Tselai/Tutorial-Codebase-Knowled...

amelius 20 April 2025
I've said this a few times on HN: why don't we use LLMs to generate documentation? But then came the naysayers ...
mvATM99 20 April 2025
This is really cool and very practical. definitely will try it out for some projects soon.

Can see some finetuning after generation being required, but assuming you know your own codebase that's not an issue anyway.

citizenpaul 21 April 2025
This is really cool. One of the best AI things I've seen in the last two years.
wg0 20 April 2025
That's a game changer for a new Open source contributor's onboarding.

Put in postgres or redis codebase, get a good understanding and get going to contribute.

kaycebasques 20 April 2025
Very cool, thanks for sharing. I imagine that this will make a lot of my fellow technical writers (even more) nervous about the future of our industry. I think the reality is more along the lines of:

* Previously, it was simply infeasible for most codebases to get a decent tutorial for one reason or another. E.g. the codebase is someone's side project and they don't have the time or energy to maintain docs, let alone a tutorial, which is widely regarded as one of the most labor-intensive types of docs.

* It's always been hard to persuade businesses to hire more technical writers because it's perenially hard to connect our work to the bottom or top line.

* We may actually see more demand for technical writers because it's now more feasible (and expected) for software projects of all types to have decent docs. The key future skill would be knowing how to orchestrate ML tools to produce (and update) docs.

(But I'm also under no delusion: it definitely possible for TWs to go the way of the dodo bird and animatronics professionals.)

I think I have a very good way to evaluate this "turn GitHub codebases into easy tutorials" tool but it'll take me a few days to write up. I'll post my first impressions to https://technicalwriting.dev

P.S. there has been a flurry of recent YC startups focused on automating docs. I think it's a tough space. The market is very fragmented. Because docs are such a widespread and common need I imagine that a lot of the best practices will get commoditized and open sourced (exactly like Pocket Flow is doing here)

potamic 20 April 2025
Did you measure how much it cost to run it against your examples? Trying to gauge how much it would cost to run this against my repos.
Retr0id 19 April 2025
The overview diagrams it creates are pretty interesting, but the tone/style of the AI-generated text is insufferable to me - e.g. https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Req...
nitinram 23 April 2025
This is super cool! I attempted to use this on a project and kept running into "This model's maximum context length is 200000 tokens. However, your messages resulted in 459974 tokens. Please reduce the length of the messages." I used open ai o4-mini. Is there an easy way to handle this gracefully? Basically if you had thoughts on how to make some tutorials for really large codebases or project directories?
1899-12-30 21 April 2025
As an extension to this general idea: AI generated interactive tutorials for software usage might be a good product. Assuming it was trained on the defined usage paths present in the code, it would be able to guide the user through those usages.
stephantul 20 April 2025
The dspy tutorial is amazing. I think dspy is super difficult to understand conceptually, but the tutorial explained it really well
theptip 20 April 2025
Yes! AI for docs is one of the usecases I’m bullish on. There is a nice feedback loop where these docs will help LLMs to understand your code too. You can write a GH action to check if your code change / release changes the docs, so they stay fresh. And run your tutorials to ensure that they remain correct.
badmonster 19 April 2025
do you have plans to expand this to include more advanced topics like architecture-level reasoning, refactoring patterns, or onboarding workflows for large-scale repositories?
ganessh 20 April 2025
Does it use the docs in the repository or only the code?
lummm 22 April 2025
I actually have created something very similar here: https://github.com/Black-Tusk-Data/crushmycode, although with a greater focus on 'pulling apart' the codebase for onboarding. So many potential applications of the resultant knowledge graph.
bionhoward 20 April 2025
“I built an AI”

Looks inside

REST API calls

pknerd 20 April 2025
Interesting..would you like to share some technical details? it did not seem you have used RAG here?
lastdong 21 April 2025
Great stuff, I may try it with a local model. I think the core logic for the final output is all in the nodes.py file, so I guess one can try and tweak the prompts, or create a template system.
chbkall 20 April 2025
Love this. These are the kind of AI applications we need which aid our learning and discovery.
zarkenfrood 20 April 2025
Really nice work and thank you for sharing. These are great demonstrations of the value of LLMs which help to go against the negative view on the impacts to junior engineers. This helps bridge the gap of most projects lacking updated documentation.
touristtam 20 April 2025
Just need to find one way to integrate into the deployment pipeline and output some markdown (or other format) to send them to what ever your company is using (or simply a live website), I'd say.
android521 20 April 2025
For anyone doubting AI as pure hype, this is the counter example of its usefulness
thom 20 April 2025
This is definitely a cromulent idea, although I’ve realised lately that ChatGPT with search turned on is a great balance of tailoring to my exact use case and avoiding hallucinations.
trash_cat 20 April 2025
This is literally what I use AI for. Excellent project.
orsenthil 20 April 2025
It will be good to integrate a local web server to fire up and read the doc. I use vscode, markdown preview. And it works too. Cool project.
polishdude20 20 April 2025
Is there an easy way to have this visit a private repository? I've got a new codebase to learn and it's behind credentials.
gbraad 21 April 2025
Interesting, but gawd awful analogy: "like a takeout order app". It tries to be amicable, which feels uncanny.
andrewrn 20 April 2025
This is brilliant. I would make great use of this.
bdg001 20 April 2025
I was using gitdiagram but llms are very bad at generating good error free mermaid code!

Thanks buddy! this will be very helpful !!

throwaway314155 20 April 2025
I suppose I'm just a little bit bothered by your saying you "built an AI" when all the heavy lifting is done by a pretrained LLM. Saying you made an AI-based program or hell, even saying you made an AI agent, would be more genuine than saying you "built an AI" which is such an all-encompassing thing that I don't even know what it means. At the very least it should imply use of some sort of training via gradient descent though.
rtcoms 22 April 2025
I would be very interested in knowing how did you build this ?
andybak 20 April 2025
Is there a way to limit the number of exclamation marks in the output?

It seems a trifle... overexcited at times.

dangoodmanUT 20 April 2025
it appears like it's leveraging the docs and learned tokens more than the actual code. For example I don't believe it could achieve that understanding of levelDB without the prior knowledge and extensive material it's probably learned on already
las_nish 20 April 2025
Nice project. I need to try this
anshulbhide 20 April 2025
Love this kind of stuff on HN
souhail_dev 20 April 2025
that's amazing, I was looking for that a while ago Thanks
lasarkolja 20 April 2025
Can anyone turn nextcloud/server into an easy tutorial
throwaway290 20 April 2025
You didn't "build an AI". It's more like you wrote a prompt.

I wonder why all examples are from projects with great docs already so it doesn't even need to read the actual code.

CalChris 20 April 2025
Do one for LLVM and I'll definitely look at it.
firesteelrain 20 April 2025
Can this work with Codeium Enterprise?
saberience 20 April 2025
I hate this language: "built an AI", did you train a new model to do this? Or are you in fact calling ChatGPT 4o, or Sonnet 3.7 with some specific prompts?

If you trained a model from scratch to do this I would say you "built an AI", but if you're just calling existing models in a loop then you didn't build an AI. You just wrote some prompts and loops and did some RAG. Which isn't building an AI and isn't particularly novel.

mraza007 20 April 2025
Impressive work.

With the rise of AI understanding software will become relatively easy

chyueli 20 April 2025
Great, I'll try it next time, thanks for sharing
lionturtle 20 April 2025
>:( :3
ryao 19 April 2025
I would find this more interesting if it made tutorials out if the Linux, LLVM, OpenZFS and FreeBSD codebases.
istjohn 20 April 2025
This is neat, but I did find an error in the output pretty quickly. (Disregard the mangled indentation)

  # Use the Session as a context manager
  with requests.Session() as s: 
   
 s.get('https://httpbin.org/cookies/set/contextcookie/abc')
      response = s.get(url) # ???
      print("Cookies sent within 'with' block:", response.json())
https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Req...