#Why are we not having a separate repo

1 messages ยท Page 1 of 1 (latest)

edgy token
#

Do you think we should? Why? (not saying we'll change, but we would welcome and appreciate you sharing your reasoning, of course)

echo jolt
#

Separating documentation from the main code repository for open-source technologies offers a ton of advantages. Here's a breakdown, expanding on your points and adding a few more:

  1. Increased Contribution Accessibility

Lower Barrier to Entry: Contributors don't need to set up a full development environment just to fix a typo or clarify a sentence. This makes contributing much more approachable, especially for non-developers like technical writers, students, or users who want to improve the docs.

Simplified Workflow: Editing Markdown files is straightforward and familiar to many. This streamlines the contribution process, encouraging more people to participate.

  1. Improved Documentation Quality

Focused Reviews: Pull requests can focus solely on documentation changes, making reviews faster and more efficient.

Less Clutter: Dedicated documentation repositories avoid getting lost in the noise of code commits, making it easier to track changes and maintain version history.

Enhanced Organization: Separate repos allow for better structuring and organization of documentation, with dedicated sections, tutorials, and examples.

#
  1. LLM-Ready Documentation

Structured Content: Markdown's inherent structure makes it easier for LLMs to parse and understand the content, enabling them to generate summaries, answer questions, and even create tutorials.

Metadata and Semantic Tagging: A dedicated repo allows for better use of metadata and semantic tags within the documentation, further enhancing LLM comprehension. This is exactly what Anthropic is doing to improve Claude's ability to access and utilize information.

  1. Faster Updates and Easier Maintenance

Independent Release Cycles: Documentation can be updated more frequently, independent of the software release cycle.

Simplified Versioning: Versioning documentation separately makes it easier to align with specific software releases and provide accurate information.

  1. Enhanced Discoverability
    Dedicated Search Optimization: A separate documentation website or repository can be optimized for search engines, making it easier for users to find the information they need.

Examples:
Many popular open-source projects are already adopting this approach:

  • Kubernetes: Kubernetes has extensive documentation in a dedicated GitHub repository.
  • React: React's documentation is also maintained in a separate repository.

In Conclusion
Separating documentation from the main codebase is a best practice that empowers open-source projects to have better, more accessible, and LLM-ready documentation. This leads to a more vibrant community, improved user experience, and ultimately, a more successful project.

#

๐Ÿ˜€ And in dagger repo we are also maintaining archive docs also which is very confusing for llm also and often llm gives older version answer for dagger questions...which is too much friction for devs, and it is creating more iterations rather then decreasing it to work faster with ai

lone mesa
#

my $0.02 is that most of these points are about process and are not tied to mono vs poly repo setups

Some of the points made seem incorrect, for example the points about Markdown. Dagger docs uses Docusaurus, a very popular documentation framework, which does use Markdown (technically MDX). You can also have independent release cycles with a monorepo, it's really about how you setup your CI and release process.

I've personally moved more towards having the docs site in the same repo as the core project. It makes maintenance easier imho

lone mesa
echo jolt
lone mesa
#

Getting the latest version of the docs when using an older version of the tool is the same problem, reversed direction

echo jolt
lone mesa
#

The LLM should know what version you are using and be able to reference that version of the docs

echo jolt
lone mesa
#

That's what RAG & agents are for

#

for example, we do not have control over when foundational models are trained, nor what their training data is (when did they last scrape the docs...?)

echo jolt
lone mesa
#

Foundational models are Claude, Gemini, OpenAi and don't exist on developer machines
There are tools like Kapa being built https://www.kapa.ai/ to fill this gap

Kapa.ai turns your knowledge base into a reliable and production-ready LLM-powered AI assistant that answers technical questions instantly. Trusted by 100+ startups and enterprises incl. OpenAI, Docker, Mapbox, Mixpanel and NextJS.

echo jolt
torn epoch
# echo jolt Separating documentation from the main code repository for open-source technolog...

I don't understand why these points aren't true with the current docs structure.

All the docs are under docs/ - just like they'd be under dagger/docs on github - they're in a separate place.

Having docs in a separate repo also comes with the significant cost, that making code+docs changes in parallel is much more difficult. This is currently we already struggle with, and I don't want to make that problem worse. Additionally, it makes our automation harder, there's some very non-trivial dependencies (e.g. automagically generated docs from the code)

Moving the docs doesn't automagically give us all these benefits - we'd need to do restructuring, decluttering, etc. The decision to do that isn't tied to a separate repo.

edgy token
torn epoch
#

We attempt to use a monorepo-like structure for our own codebase - it makes it much easier for us as core devs + docs folks to hack on things in parallel. If we decide to take on the extra cost of splitting it out, we need to have a concrete benefit to the team - this isn't a user facing feature

lone mesa
#

Docusaurus is popular, the LLMs ought to be trained to recognize and handle accordingly. In other words, the promise of LLMs is that they are flexible and adaptable to us, not that we should all conform to some standard that is ideal for them

harsh willow
#

๐Ÿ‘‹ Just sharing my two cents about prefering staying with the current docs structure. The most valuable feature for me is the fact that in single PR we can address both code and docs changes. This makes is extremely easily to ship features in a consistent way while keeping the flow consistent and simple.

For what I understand from the LLM arguments @echo jolt has shared above, it doesn't seem to me that any of those points directly apply to how we're managing / structuring our docs and/or justify the reason to move them to a separate repo. Having said that, I'm not opposed about to keep re-evaluating this as long as there's a solid argument to move forward.

So yes, I thing along @torn epoch 's line of thought here.

echo jolt
echo jolt
harsh willow
gloomy lintel
#

We've tried keeping docs in separate repo. We've always moved docs back to where the code is because it's really hard to manage. So my vote is to keep docs where the code is. The other issues with AI can be solved by tuning those models properly.

harsh willow
harsh willow
#

awesome! @echo jolt are you ok on closing this thread then?

echo jolt
echo jolt
torn epoch
#

there's no current issue from what i can quickly find

harsh willow
echo jolt
harsh willow
#

I'll go ahead and close this thread. Thx everyone for your thoughts

echo jolt