#Why are we not having a separate repo
1 messages ยท Page 1 of 1 (latest)
Do you think we should? Why? (not saying we'll change, but we would welcome and appreciate you sharing your reasoning, of course)
Separating documentation from the main code repository for open-source technologies offers a ton of advantages. Here's a breakdown, expanding on your points and adding a few more:
- Increased Contribution Accessibility
Lower Barrier to Entry: Contributors don't need to set up a full development environment just to fix a typo or clarify a sentence. This makes contributing much more approachable, especially for non-developers like technical writers, students, or users who want to improve the docs.
Simplified Workflow: Editing Markdown files is straightforward and familiar to many. This streamlines the contribution process, encouraging more people to participate.
- Improved Documentation Quality
Focused Reviews: Pull requests can focus solely on documentation changes, making reviews faster and more efficient.
Less Clutter: Dedicated documentation repositories avoid getting lost in the noise of code commits, making it easier to track changes and maintain version history.
Enhanced Organization: Separate repos allow for better structuring and organization of documentation, with dedicated sections, tutorials, and examples.
- LLM-Ready Documentation
Structured Content: Markdown's inherent structure makes it easier for LLMs to parse and understand the content, enabling them to generate summaries, answer questions, and even create tutorials.
Metadata and Semantic Tagging: A dedicated repo allows for better use of metadata and semantic tags within the documentation, further enhancing LLM comprehension. This is exactly what Anthropic is doing to improve Claude's ability to access and utilize information.
- Faster Updates and Easier Maintenance
Independent Release Cycles: Documentation can be updated more frequently, independent of the software release cycle.
Simplified Versioning: Versioning documentation separately makes it easier to align with specific software releases and provide accurate information.
- Enhanced Discoverability
Dedicated Search Optimization: A separate documentation website or repository can be optimized for search engines, making it easier for users to find the information they need.
Examples:
Many popular open-source projects are already adopting this approach:
- Kubernetes: Kubernetes has extensive documentation in a dedicated GitHub repository.
- React: React's documentation is also maintained in a separate repository.
In Conclusion
Separating documentation from the main codebase is a best practice that empowers open-source projects to have better, more accessible, and LLM-ready documentation. This leads to a more vibrant community, improved user experience, and ultimately, a more successful project.
๐ And in dagger repo we are also maintaining archive docs also which is very confusing for llm also and often llm gives older version answer for dagger questions...which is too much friction for devs, and it is creating more iterations rather then decreasing it to work faster with ai
๐ and we can have text or json ready for LLM like anthropic did...
Check this
https://x.com/alexalbert__/status/1857457290917589509?t=vlkmc9IH6jQ0YAWr5-egkw&s=19
Friday docs feature drop:
You can now access all of our docs concatenated as a single plain text file that can be fed in to any LLM.
Here's the url route: https://t.co/ILkO9q4fyk
my $0.02 is that most of these points are about process and are not tied to mono vs poly repo setups
Some of the points made seem incorrect, for example the points about Markdown. Dagger docs uses Docusaurus, a very popular documentation framework, which does use Markdown (technically MDX). You can also have independent release cycles with a monorepo, it's really about how you setup your CI and release process.
I've personally moved more towards having the docs site in the same repo as the core project. It makes maintenance easier imho
Arguably, having all the versions of the docs available is beneficial for LLMs. If I'm on an older version of Dagger, I want the LLM to reference those docs, not the latest.
This problem seems more about how to supply an LLM query with the proper context, which is generally a challenge for a lot of reasons
No, getting an older version code suggestions from llm while using the latest version is always breaking your code. So I think that's not good.
Just a feedback
Getting the latest version of the docs when using an older version of the tool is the same problem, reversed direction
But comparatively projects like dagger will always suggest staying with the latest version. And it's advisable to use the latest version for better features and performance...
The LLM should know what version you are using and be able to reference that version of the docs
But that's a little bit tricky.
As such it's not in our hand for all the time to train them and give suggestions for specific version
That's what RAG & agents are for
for example, we do not have control over when foundational models are trained, nor what their training data is (when did they last scrape the docs...?)
Every developer who is using a dagger cannot have all these things implemented in their machines...
Think about the all end users of dagger
Think about the community and not for expert developers only.
Foundational models are Claude, Gemini, OpenAi and don't exist on developer machines
There are tools like Kapa being built https://www.kapa.ai/ to fill this gap
Kapa.ai turns your knowledge base into a reliable and production-ready LLM-powered AI assistant that answers technical questions instantly. Trusted by 100+ startups and enterprises incl. OpenAI, Docker, Mapbox, Mixpanel and NextJS.
But at global level developers use GitHub copilot more
I don't understand why these points aren't true with the current docs structure.
All the docs are under docs/ - just like they'd be under dagger/docs on github - they're in a separate place.
Having docs in a separate repo also comes with the significant cost, that making code+docs changes in parallel is much more difficult. This is currently we already struggle with, and I don't want to make that problem worse. Additionally, it makes our automation harder, there's some very non-trivial dependencies (e.g. automagically generated docs from the code)
Moving the docs doesn't automagically give us all these benefits - we'd need to do restructuring, decluttering, etc. The decision to do that isn't tied to a separate repo.
We are exploring a GitHub Copilot extension based on RAG from docs and examples. Agree it could be hepful.
We attempt to use a monorepo-like structure for our own codebase - it makes it much easier for us as core devs + docs folks to hack on things in parallel. If we decide to take on the extra cost of splitting it out, we need to have a concrete benefit to the team - this isn't a user facing feature
Docusaurus is popular, the LLMs ought to be trained to recognize and handle accordingly. In other words, the promise of LLMs is that they are flexible and adaptable to us, not that we should all conform to some standard that is ideal for them
๐ Just sharing my two cents about prefering staying with the current docs structure. The most valuable feature for me is the fact that in single PR we can address both code and docs changes. This makes is extremely easily to ship features in a consistent way while keeping the flow consistent and simple.
For what I understand from the LLM arguments @echo jolt has shared above, it doesn't seem to me that any of those points directly apply to how we're managing / structuring our docs and/or justify the reason to move them to a separate repo. Having said that, I'm not opposed about to keep re-evaluating this as long as there's a solid argument to move forward.
So yes, I thing along @torn epoch 's line of thought here.
Or we can do something like this with every release for each programming language either in text or json
We can change code and docs both at same time even if they are in separate repo, with either GitHub submode or vs code multiroot workspace
yes, that's fine. Users have requested this already. This doen't mean that we need to move our docs to a separate repo to make this happen
Agree
We've tried keeping docs in separate repo. We've always moved docs back to where the code is because it's really hard to manage. So my vote is to keep docs where the code is. The other issues with AI can be solved by tuning those models properly.
yes, I'm not saying it's not possible. I'm saying that it's easier and simpler to keep things how they are. So far I personally haven't seen any argument that justifies making this change.
Yes cool. Agree
awesome! @echo jolt are you ok on closing this thread then?
Any issue to track this or should I create a new one?
Yes obv. But for a llm ready context we should do something.
there's no current issue from what i can quickly find
๐ we can open an issue in dagger/dagger for that. cc @indigo shale
Okay then I will create one soon.
I'll go ahead and close this thread. Thx everyone for your thoughts
Okay cool, thanks a lot everyone for hearing. I really appreciate it.
Dagger team rocks.