#Github Copilot Agent
1 messages · Page 1 of 1 (latest)
@mild roost do you mean a bad idea for Github to launch this? Or a bad idea for Dagger contributors to use it?
I think it's legally compromised.
Oh I see
the whole situation with the lawsuits over model theft of content from all over the internet is one part.
lawyers who use office 365 are suing because it's exfiltrating their legal documents that are made in confidence and even if they have guardrails to keep them from leaking out of their practice, they don't have any guardrails to prevent them from leaking between clients in the same practice, which destroys protections to maintain confidentiality.
two things you don't mess with: confidentiality and client's money. more lawyers are disbarred over those two issues than any other.
Like I don't mind rules-based code generators but I've seen these tools just lift things from code with the wrong license and I'm really not a fan of the exposure.
(by wrong license, I mean any of the so-called "viral" licenses)
not to get lawyery, but copyright infringement does require intent to copy, and stealing GPL code through an AI introduces a ton of doubt into an already-very-difficult-to-prove-intent situation
put another way, we'll see if M$'s massive army of IP lawyers get more work to do because of this, but im kinda doubtful users will have issues bigger than unenforcible C&Ds
@mild roost @slate pawn @versed glade
I may be wrong about context here but, in case if you guys don't want to send your data then you can turn off it github copilot settings...
At our company we have a policy against using new systems until we know the controls (in the sense of ensuring compliance) and they they pass IQ/OQ validation proving compliance. We have documentation requirements tied to CFR 21 part 11 (FDA).
There's a whole video on the youtube lawful masses about the lawer stuff I mentioned above. In their profession, the intent doesn't matter so much as the result, in the situation above. Also, the video is about the project manager for 365 not understanding that, microsoft moving fast and breaking things in the process all while apparently not asking lawyers if this use case could cause conflicts.
Microsoft has turned Copilot Generative AI features on for everyone in the product formally known as Office. Does Microsoft 365 with Copilot violate lawyers' duty to protect client confidential information? Ben Schorr from Microsoft responds - poorly.
#copilot #AI #lawyers
Join the Lawful Masses community on Discord!
https://discord.gg/cUqVPzQ...
we have a similar confidentiality rule where someone working on a project for $redacted_a cannot take findings from unpublished work there over to $redacted_b
@mild roost I am saying about using such ai in open source project, while anyone is building 100% open source technology and not having any confidential code. at that time i think it can help...
And this is Github Copilot and not Microsoft Copilot...JFYI
Don't ignore licenses. Our better put: ignore licenses at the cost of exposure
That means using devin is fine and github copilot is not fine?
just want to learn so i asked, that's it
I don't think any of it is fine from this perspective. They take in (appropriate) content they are not licensed to use.
there are many legal battles being fought over this on multiple fronts.
also, GoLand has some "ai" parts, and they recently broke the inline tool by trying to integrate into their "ai" stack. or google's use of gemini v1 as an assistant replacement that couldn't do the one thing for me that I do with assistant: set a timer.
but gemini 2 flash and claude has proven in our internal testing that they are very good for many major bug fixes and many enhancements in code etc
copilot 365 is 100% a compliance/confidentiality/trade secrets nightmare, im just doubtful about the the GPL-bleed thing that's always the first thing software engineers jump to.... copyright law vs bar ethics vs trade secrets are all relatively orthogonal here. the craziest part of it in trade secrets law is that trade secrets are only trade secrets when you make "reasonable efforts" to protect confidentiality and there is a very strong argument to be made that if you feed your trade secrets to a 3rd party LLM you're not making a reasonable effort to keep it secret... then bam, your 150 year old top-secret coca-cola recipe is public domain. the bar has an even lower standard, "reasonable" does not apply, it's just don't break confidentiality period.
Anyways it's now your or dagger team's call to evaluate and use for faster development or not. Thanks for your time and knowledge.
@mild roost
Microsoft owns GitHub, it's the same company
With most of the large LLM vendors, you can buy plans where they do not train on your data. For example, this is part of paid Google Workspace accounts and Vertex AI
I propose we explore leveraging advanced Large Language Models (LLMs) like Gemini and Claude Sonnet to accelerate Dagger's development. Both offer potential advantages, and evaluating them is crucial for maximizing value, especially in the open-source context where rapid delivery is key. It's also important to consider the data handling practices of each provider.
-
Gemini: GitHub Copilot uses Gemini 2.0 Flash hosted on Google Cloud Platform (GCP). As documented by GitHub, "Gemini doesn't use your prompts, or its responses, as data to train its models." Prompts and metadata are sent to GCP. https://docs.github.com/en/copilot/using-github-copilot/ai-models/using-gemini-flash-in-github-copilot#about-gemini-20-flash-in-github-copilot
-
Claude Sonnet: Amazon Bedrock, which hosts Claude, states, "Amazon Bedrock doesn't store or log your prompts and completions. Amazon Bedrock doesn't use your prompts and completions to train any AWS models and doesn't distribute them to third parties." https://docs.github.com/en/copilot/using-github-copilot/ai-models/using-claude-sonnet-in-github-copilot#about-claude-35-sonnet-in-github-copilot
I urge the Dagger core team to assess the potential of these LLMs for integration, considering factors like performance, cost, data handling practices, and any other relevant implications. Thank you.
If AI could understand the BuildKit code base, I would be impressed. Even I had a hard time learning it because it has near zero comments and is rather complicated overall
then we should use ai to add comments into it...
AI comments tend to just explain the steps of the code, not the conceptual logic or why things are done as they are
These types of comments are noise and don't help people
We can have two type of comments into code, one is to explain code and other details of code and other it to mention why or decision taken for this code.
We can prioritize such work and can start work.
But we should start adopting for open source project also in which everything is open. GenAi can give so much value to such projects.
They still hallucinate too much, they are still very elementary in their explanations, it's why projects don't adopt AI more
ok can you please guide me which folder has main source code in buidkit? https://github.com/moby/buildkit
I think figuring that out will help you understand why I wouldn't want to work on that codebase
You're going to have to understand the code in order to know if the AI is doing a good job or not
I have uploaded dagger engine code into gemini to get code enhancement comments. and here it is @versed glade
Just one example that shows how these AI have no idea what they are talking about
Most of the advice in there is very generic, which is expected because transformers are giant averaging machines that predict the most likely next token(s).
I understand some feedback on AI might be less helpful. However, I believe the vast majority (around 99%) offers valuable insights. Rather than dismissing AI's potential based on a small percentage of less useful feedback, shouldn't we prioritize leveraging the significant value it can bring to Dagger's users? What's your perspective?
My perspective is that they are wrong more often than right based on my usage. 99% is way overestimating, I don't recall any, what I would consider, insights in the video you shared.
On top of that, it made multiple errors and incorrect interpretations of the code. This only reinforces my opinion these things are not ready for prime time.
I'm with Yann LeCun when he says we need to move beyond transformers to live up to the hype
I just want to learn so i am asking...
As you mentioned this are generic comments so i wonder
In future should we consider upgrading the dagger code as mentioned by gemini as shown in this video? even for few suggestions...
If the answer is yes then no discussion required, and if the answer is no should we rethink?
What are your prospects on https://dagger.io/blog/new-ai-developer-devin devin adoption?
Check SWE from github here https://www.youtube.com/watch?v=VWvV2-XwBMM&embeds_referring_euri=https%3A%2F%2Fgithub.blog%2F&source_ve_path=MjM4NTE
A lot of companies have invested a lot of money into AI, they are going to overpromise and try to sell whatever they can to make that money back. Many others are on the AI marketing train, because that is what is vogue right now. You can show cool examples, but when tested outside of constrained examples, they all fail to live up to the promise.
re: the Dagger blog post and PR, I wonder if @versed glade noticed that they commited a 42Mb binary to git...?
https://github.com/dagger/dagger/pull/9130/files#r1881196667
More context: https://github.com/dagger/dagger/pull/9134
Looks like that commit no longer exists in git history...
https://github.com/dagger/dagger/commit/21bcbea6dbb2e13f434b0561eb8a927bd2f7a1f0
The main thing I've learned from this AI hype cycle is that people really really want to outsource their thinking and the wizard behind the curtain has too many people believing these tools are ready for it, which they are not
I have to write secure software that checks its assumptions to prevent the kinda stupid errors that copying example code causes. Why would I settle for 99% solution, let alone a technology that regularly miss counts the r's in strawberry
Reinforcing your point but you mentioned it not even being close to 99%
Also, I do not want comments about the what in code unless it's something like a weird hand rolled loop. Even so, I didn't want a redundant retelling of the steps; that's what the code is for. I want to know why. Why are you anding this value with that? What does a variable represent in a function? Where did this constant come from? What is this technique called?
Like, I hope everyone understands what the "reasoning" steps in these new models are for. They're to prestidigitate on the "impressive" thinking beyond the wrong answer. Just ask any of the explanatory mode models to give you a random number and explain it.
It goes on about lucky numbers and how 42 is used in popular culture and how a number less than ten doesn't look random enough and other word salads.
We did notice the next morning 🙂 I thought it was funny that agents make the kinds of mistakes a human might make.