Why are we not having a separate repo | Dagger | Page 1

edgy token Nov 17, 2024, 3:13 PM

#

Do you think we should? Why? (not saying we'll change, but we would welcome and appreciate you sharing your reasoning, of course)

echo jolt Nov 18, 2024, 3:13 PM

#

Separating documentation from the main code repository for open-source technologies offers a ton of advantages. Here's a breakdown, expanding on your points and adding a few more:

Increased Contribution Accessibility

Lower Barrier to Entry: Contributors don't need to set up a full development environment just to fix a typo or clarify a sentence. This makes contributing much more approachable, especially for non-developers like technical writers, students, or users who want to improve the docs.

Simplified Workflow: Editing Markdown files is straightforward and familiar to many. This streamlines the contribution process, encouraging more people to participate.

Improved Documentation Quality

Focused Reviews: Pull requests can focus solely on documentation changes, making reviews faster and more efficient.

Less Clutter: Dedicated documentation repositories avoid getting lost in the noise of code commits, making it easier to track changes and maintain version history.

Enhanced Organization: Separate repos allow for better structuring and organization of documentation, with dedicated sections, tutorials, and examples.

#

LLM-Ready Documentation

Structured Content: Markdown's inherent structure makes it easier for LLMs to parse and understand the content, enabling them to generate summaries, answer questions, and even create tutorials.

Metadata and Semantic Tagging: A dedicated repo allows for better use of metadata and semantic tags within the documentation, further enhancing LLM comprehension. This is exactly what Anthropic is doing to improve Claude's ability to access and utilize information.

Faster Updates and Easier Maintenance

Independent Release Cycles: Documentation can be updated more frequently, independent of the software release cycle.

Simplified Versioning: Versioning documentation separately makes it easier to align with specific software releases and provide accurate information.

Enhanced Discoverability
Dedicated Search Optimization: A separate documentation website or repository can be optimized for search engines, making it easier for users to find the information they need.

Examples:
Many popular open-source projects are already adopting this approach:

Kubernetes: Kubernetes has extensive documentation in a dedicated GitHub repository.
React: React's documentation is also maintained in a separate repository.

In Conclusion
Separating documentation from the main codebase is a best practice that empowers open-source projects to have better, more accessible, and LLM-ready documentation. This leads to a more vibrant community, improved user experience, and ultimately, a more successful project.

#

😀 And in dagger repo we are also maintaining archive docs also which is very confusing for llm also and often llm gives older version answer for dagger questions...which is too much friction for devs, and it is creating more iterations rather then decreasing it to work faster with ai

#

📃 and we can have text or json ready for LLM like anthropic did...
Check this
https://x.com/alexalbert__/status/1857457290917589509?t=vlkmc9IH6jQ0YAWr5-egkw&s=19

Alex Albert (@alexalbert__) on X

Friday docs feature drop:

You can now access all of our docs concatenated as a single plain text file that can be fed in to any LLM.

Here's the url route: https://t.co/ILkO9q4fyk

lone mesa Nov 18, 2024, 3:40 PM

#

my $0.02 is that most of these points are about process and are not tied to mono vs poly repo setups

Some of the points made seem incorrect, for example the points about Markdown. Dagger docs uses Docusaurus, a very popular documentation framework, which does use Markdown (technically MDX). You can also have independent release cycles with a monorepo, it's really about how you setup your CI and release process.

I've personally moved more towards having the docs site in the same repo as the core project. It makes maintenance easier imho

lone mesa Nov 18, 2024, 3:43 PM

#

echo jolt 😀 And in dagger repo we are also maintaining archive docs also which is very co...

Arguably, having all the versions of the docs available is beneficial for LLMs. If I'm on an older version of Dagger, I want the LLM to reference those docs, not the latest.

This problem seems more about how to supply an LLM query with the proper context, which is generally a challenge for a lot of reasons

echo jolt Nov 18, 2024, 4:00 PM

#

lone mesa Arguably, having all the versions of the docs available is beneficial for LLMs. ...

No, getting an older version code suggestions from llm while using the latest version is always breaking your code. So I think that's not good.

Just a feedback

lone mesa Nov 18, 2024, 4:00 PM

#

Getting the latest version of the docs when using an older version of the tool is the same problem, reversed direction

echo jolt Nov 18, 2024, 4:01 PM

#

lone mesa Getting the latest version of the docs when using an older version of the tool i...

But comparatively projects like dagger will always suggest staying with the latest version. And it's advisable to use the latest version for better features and performance...

lone mesa Nov 18, 2024, 4:01 PM

#

The LLM should know what version you are using and be able to reference that version of the docs

echo jolt Nov 18, 2024, 4:02 PM

#

lone mesa The LLM should know what version you are using and be able to reference that ver...

But that's a little bit tricky.
As such it's not in our hand for all the time to train them and give suggestions for specific version

lone mesa Nov 18, 2024, 4:03 PM

#

That's what RAG & agents are for

#

for example, we do not have control over when foundational models are trained, nor what their training data is (when did they last scrape the docs...?)

echo jolt Nov 18, 2024, 4:06 PM

#

lone mesa for example, we do not have control over when foundational models are trained, n...

Every developer who is using a dagger cannot have all these things implemented in their machines...
Think about the all end users of dagger
Think about the community and not for expert developers only.

lone mesa Nov 18, 2024, 4:15 PM

#

Foundational models are Claude, Gemini, OpenAi and don't exist on developer machines
There are tools like Kapa being built https://www.kapa.ai/ to fill this gap

kapa.ai - Instant AI Answers to Technical Questions

Kapa.ai turns your knowledge base into a reliable and production-ready LLM-powered AI assistant that answers technical questions instantly. Trusted by 100+ startups and enterprises incl. OpenAI, Docker, Mapbox, Mixpanel and NextJS.

echo jolt Nov 18, 2024, 4:16 PM

#

lone mesa Foundational models are Claude, Gemini, OpenAi and don't exist on developer mach...

But at global level developers use GitHub copilot more

torn epoch Nov 18, 2024, 4:17 PM

#

echo jolt Separating documentation from the main code repository for open-source technolog...

I don't understand why these points aren't true with the current docs structure.

All the docs are under docs/ - just like they'd be under dagger/docs on github - they're in a separate place.

Having docs in a separate repo also comes with the significant cost, that making code+docs changes in parallel is much more difficult. This is currently we already struggle with, and I don't want to make that problem worse. Additionally, it makes our automation harder, there's some very non-trivial dependencies (e.g. automagically generated docs from the code)

Moving the docs doesn't automagically give us all these benefits - we'd need to do restructuring, decluttering, etc. The decision to do that isn't tied to a separate repo.

edgy token Nov 18, 2024, 4:19 PM

#

echo jolt But at global level developers use GitHub copilot more

We are exploring a GitHub Copilot extension based on RAG from docs and examples. Agree it could be hepful.

torn epoch Nov 18, 2024, 4:20 PM

#

We attempt to use a monorepo-like structure for our own codebase - it makes it much easier for us as core devs + docs folks to hack on things in parallel. If we decide to take on the extra cost of splitting it out, we need to have a concrete benefit to the team - this isn't a user facing feature

lone mesa Nov 18, 2024, 4:20 PM

#

Docusaurus is popular, the LLMs ought to be trained to recognize and handle accordingly. In other words, the promise of LLMs is that they are flexible and adaptable to us, not that we should all conform to some standard that is ideal for them

harsh willow Nov 18, 2024, 4:28 PM

#

👋 Just sharing my two cents about prefering staying with the current docs structure. The most valuable feature for me is the fact that in single PR we can address both code and docs changes. This makes is extremely easily to ship features in a consistent way while keeping the flow consistent and simple.

For what I understand from the LLM arguments @echo jolt has shared above, it doesn't seem to me that any of those points directly apply to how we're managing / structuring our docs and/or justify the reason to move them to a separate repo. Having said that, I'm not opposed about to keep re-evaluating this as long as there's a solid argument to move forward.

So yes, I thing along @torn epoch 's line of thought here.

echo jolt Nov 18, 2024, 4:31 PM

#

echo jolt 📃 and we can have text or json ready for LLM like anthropic did... Check this h...

Or we can do something like this with every release for each programming language either in text or json

echo jolt Nov 18, 2024, 4:32 PM

#

harsh willow 👋 Just sharing my two cents about prefering staying with the current docs stru...

We can change code and docs both at same time even if they are in separate repo, with either GitHub submode or vs code multiroot workspace

harsh willow Nov 18, 2024, 4:32 PM

#

echo jolt Or we can do something like this with every release for each programming languag...

yes, that's fine. Users have requested this already. This doen't mean that we need to move our docs to a separate repo to make this happen

echo jolt Nov 18, 2024, 4:32 PM

#

harsh willow yes, that's fine. Users have requested this already. This doen't mean that we ne...

Agree

gloomy lintel Nov 18, 2024, 4:34 PM

#

We've tried keeping docs in separate repo. We've always moved docs back to where the code is because it's really hard to manage. So my vote is to keep docs where the code is. The other issues with AI can be solved by tuning those models properly.

harsh willow Nov 18, 2024, 4:35 PM

#

echo jolt We can change code and docs both at same time even if they are in separate repo,...

yes, I'm not saying it's not possible. I'm saying that it's easier and simpler to keep things how they are. So far I personally haven't seen any argument that justifies making this change.

echo jolt Nov 18, 2024, 4:36 PM

#

harsh willow yes, I'm not saying it's not possible. I'm saying that it's easier and simpler t...

Yes cool. Agree

harsh willow Nov 18, 2024, 4:36 PM

#

awesome! @echo jolt are you ok on closing this thread then?

echo jolt Nov 18, 2024, 4:36 PM

#

echo jolt Or we can do something like this with every release for each programming languag...

Any issue to track this or should I create a new one?

echo jolt Nov 18, 2024, 4:38 PM