#coding-agents-and-llms

1 messages · Page 1 of 1 (latest)

deep summit
#

Here is my current workflow: https://www.youtube.com/watch?v=iqcNcdsrOgA

Join Scott as he recaps his LLM agentful week and how it's changing how he works. He'll also try to answer any questions folks have.

Thanks to dcd for time codes!

0:00 Getting started
2:06 Hello Everyone - intro to Deep Dive and microprocessor
4:54 NXP Freedom RW 612
5:16 Dialog chip with bluetooth
6:10 LLMs and Claude code - agent / harness o...

▶ Play video
low marsh
#

I agree with the concern about mentoring and the potential loss of opportunities for junior devs to get a chance to make an impact.

However, it's worth noting that the effect of domain experience doesn't always work that way. At least when it comes to productivity, studies suggest that LLM dev seems to be less helpful to people who already possess domain experience, since they need less help from the LLM and are also better able to detect problems in the LLM's output (and thus spend more time addressing those problems that a less knowledgable dev might have let slip by). On the other hand, people who lack expertise with a particular domain or task find LLM assistance to be greatly preferable to learning how to do that thing themselves.

Which is where I see the dev pipeline issue coming in. A senior dev with high domain expertise in a field is unlikely to want to put too much time into side tasks outside their primary expertise, but might now rely on a LLM to do those side tasks instead of delegating them to a junior dev. For example, to a senior dev, writing automated tests is annoying busywork that takes them away from more interesting tasks...but for junior devs, it's a great opportunity to get acquainted with an unfamiliar codebase.

On the other hand, other studies suggest that people who lack domain expertise tend to output worse-quality code relying on LLMs compared to doing it themselves. Since they're aware they lack expertise in the subject, they tend to overly rely on the LLM and are less likely to challenge its results.

As far as I'm aware, there isn't much (if any) literature that measures both productivity and quality at the same time. But I think we need to be careful to evaluate it relying on hard numbers and metrics as much as possible, since the top lesson of coding LLMs so far is that devs are really bad at measuring our own productivity by vibes alone.

deep summit
#

I disagree that writing tests is a good way for someone to get experience in a code base because the breadth is so small. I think adding a feature across multiple subsystems would be much more helpful.

#

My understanding is that most studies are just behind where the tech is now. I look forward to seeing newer ones that use coding agent loops and Opus 4.5+ or codex 5.2+. I think they make testing in the loop much more important.

#

I feel like I'm trying to switch to automating the testing part so that I can reduce my review time of LLM code.

#

And also leaning into skills and AGENTS.md to minimize the prompting I have to redo over and over.

humble ether
#

I'm not ready to get right into the middle of the rodeo on all this discussion here yet, but I do want to point out a disagreement that I have with the recent Desk of Ladyada video. I think that the comparison between vibe coding with LLMs and machine code compilers isn't accurate.

Compilers were carefully built by a multitude of intelligent developers throughout the decades and have clear rules and regulations. The machine language they produce is only a translation of the code you've written for the platform in question and in nearly all cases can be trusted to perform their task. I don't think that LLMs do any of that. The rules or regulations are dictated by what they scrape from others' code, the output they produce isn't a translation and it requires serious supervision to achieve a positive result.

One thing I do think they have in common is that compilers don't teach you how to write machine code/assembly nor does vibe coding with LLMs teach you how to program. 😂

cunning forge
#

Seeing how the discussion here so far has kinda steered towards technical perspectives, I want to re-seed the conversation with one of the ideas I was trying to get at yesterday in my initial reaction to the flood of copilot notifications that sparked all this...

All this LLM stuff is happening in the context of social upheaval around people feeling like their future is threatened in various ways (pollution, datacenter impact on power grids, lost jobs, spiking RAM prices, slop wrecking the internet, and on and on). That means there are a lot of strong emotions that get triggered by this stuff. It won't work to just view this as a purely technical issue that can be considered on a purely rational technical basis.

I hope people can remember to consider the human side of this tech and to consder how lots of folks find it very threatening. For instance, me getting bombarded with AI generated notifications triggers a constellation of emotions about AI slop that didn't originate with Adafruit. But, as Adafruit was the source of the notifications, my lizard brain ties those two things together.

#

It's not rational, but I'm human, so, what can I do.

tulip owl
#

Since agents are the inevitable future of programming, what is the point of CircuitPython? Why should anyone learn to write code at all? SWEs shouldn't care about language or platform, since the agent can literally produce whatever's required.

So I shouldn't even bother with CircuitPython or Arduino or C/C++ or Zephyr. That thought process is ancient history. The only thing worth learning to program now is the agent, otherwise I'm just wasting my time.

That's the depressing part. Coding hasn't become more democratized, it's now controlled by the same five f*cking oligarchs who own everything, and we now have to pay them for every token.

cunning forge
#

eh, maybe. maybe not. It's pretty early to say how this ends up playing out. I'm guessing this eventually turns into a technical revolution in the style of trains running on steam engines or cars displacing horse and buggy. People still moved around on wheels, people still were needed to operate and maintain the vehicles, but the details were a lot different than what came before.

#

There is probably a lot of opportunity for people who learn how to work effectively with AI assistance.

humble ether
#

I have another hopeful comparison to add to this. Photography didn't replace painting 🎨

deep summit
#

I think that code still fits the role in fixing the functionality.

#

and allowing one to test it and verify

deep summit
# cunning forge Seeing how the discussion here so far has kinda steered towards technical perspe...

I agree we should do our best to combat the slop. We need to be defensive against notification DoSing and PR DoSing. I think there is some element of drive by PRing that we can fix by getting ahead of it. In other words, drive by PRs are done by well intentioned folks who don't engage early enough. AGENTS.md can help steer the agent they choose to use in the way that we want.

I agree this adds a ton of uncertainty to what the future of software development is. I'm expressing my worry about this by trying new things and learning from other folks who are too.

deep summit
# tulip owl Since agents are the inevitable future of programming, what is the point of Circ...

I still think there is a role for code. It fixes functionality and provides is an artifact you can inspect. (Just like the assembly code that a compiler produces.) This makes reading code and navigating a software system still really important and something to learn. Adding features to CircuitPython still unlocks more functionality to folks who want make something on a microcontroller.

cunning forge
#

One last thing and then I'll do my best to shut up... While the copilot thing was the trigger that prompted me to finally open my mouth, I don't want to give the false impression that's what my major concern is about. When I mentioned weird vibes, I was talking about the whole gestalt of Adafruit's public communication since January, on and off main, which includes Fedi, Bluesky, HN, forums, etc. I'm familiar with as much of the backstory as is public, and I think I kinda get what's going on. That said, from the outside, what I see is, the vibes are not good. Perhaps it's unavoidable. But my response is that the best option for me is to put some distance between myself and the mess. I'd rather be thinking about writing up new projects, but it is what it is.

cunning forge
#

It would be very reassuring for me to see a sustained pattern of public communications demonstrating understanding of, and respect for, the concerns that people have about LLM technology.

#

Like, LLMs are here. It's a done deal. This is happening. But, it's not a strictly positive change, and people do have legitimate concerns about how it will affect their lives.

wraith spruce
#

I want to put in 2¢ on the jobs thing:

Adafruit specifically doesn't have an end goal for most of the software it's writing. There's no definition where CircuitPython becomes a finished product, ready for delivery, that no longer needs any more active development. In fact, CircuitPython isn't directly generating income at all. If Adafruit wanted to cut devs to save money, they wouldn't lose any income by doing so, at least not in the short run. Adafruit's business model is heavily weighted toward making open hardware and selling it for income, and then making open source software that makes that open source hardware more useful, which only generates income indirectly. In this business model, using something like LLMs to increase productivity, and then cutting employees equal to the productivity replaced by the LLMs is clearly a bad business move, because what those employees are providing isn't direct income. There's no, "We can finish this product, thus generating a lot of profit, with 25% fewer people if we use LLMs and cut people." Instead, if LLMs are used to increase productivity, that directly translates to the ability to support more open source hardware, increasing the selection of profitable products, and thus increasing profits. If LLMs can increase dev speed enough to be worth the cost (and to be clear, that is still up in the air), then by using LLMs, Adafruit can now port CircuitPython to a bunch more chips, a bunch more boards, and write drivers for a bunch of new peripherals.

This doesn't make human devs less valuable to Adafruit. It might actually make them more valuable, because increased profits due to increased product offerings means more money to spend on more devs and those new devs can be more productive with the LLMs as well, making them more valuable on top of that.

Adafruit's business model just doesn't create a motive to use LLMs to replace devs.

rotund folio
#

I'm actually very heartened to see (or at least have found the corner of the internet where) there's a sort of makerspace feel around making things with AI-assisted programming. I'm seeing people who didn't know how to program find gaps or things that are needed and they just make them, and make them open-source. If my intersection with this were more being in an office where it was being mandated or where I was competing to avoid redundancy I could see having very different feelings about it. Or for that matter, where I do have contact with it - doing machine learning and data science consultancy and suddenly everything is 'how many LLMs can we cram into the hood of this car' and I'm mostly in the position of having to push back on that.

#

But I really like the fact that this seems to be setting off a kind of second wave of open-source projects where I can actually see people get excited and just want to go and do a thing and they go do it, where otherwise they might be too tired or doubt their ability or they might look at the project and say 'this is way too much for me to do'

#

and I'm actually finding myself caught up in that as well; there are just little things which yeah I probably could program but I would have bounced off of getting myself to actually sit down and do it, which now I'm just making; not really for anyone else, though I guess they could be useful to others; but its just, okay I can decide today I want a browser plugin that filters out spam posts from a forum I frequent, or a blender plugin to drape vines on things, or a software midi looper thing that I control with some hacked together buttons and a pi pico W on a breadboard

#

but I think the costs are such that this is a little bit of an awkward space still... making something serious might cost a little bit more in API costs/etc than buying it, but 1. that all goes to a few centralized providers, and 2. if it's not in a form that other people can easily also work on, we all end up just making our own stuff and that's a lot of waste. But if there does turn out to be a good way to actually have multiple people who don't necessarily understand the code they're producing contributing and not e.g. turning the code into an unmaintainable mess or introducing things that become problems later, it could be really good.

So I think that's kind of where I'm still watching and seeing what people try...

tulip owl
#

It doesn't take a genius to extrapolate the next few steps.

Once I've bought my $200/mo agent license, and my $600 fullsize jumperless breadboard (that decides where I obediently place components), and my $300 logic analyzer hooked up in a closed loop, the agent can simply iterate development and debug until the input/output graph matches the spec.

What do I care which language it uses? Might as well write it in assembly for maximum performance; no human will ever need to read or maintain it, or soon even comprehend it. So long as it works, then mission accomplished.

Order now and receive a FREE Adafruit™ Teensy with limited edition silkscreen of Limor and Phil gleefully flipping the bird. USA! USA!

Yeah, demoralized and discouraged is perhaps putting it lightly.

wraith spruce
#

Actually, they've been doing this with simulations and other forms of AI since the 1990s. Awesome things have come out of it, but so far it hasn't been profitable. It takes too long, it's not reliable enough (only as reliable as the simulation), and it tends to design circuits that we don't have the technology to make. There are a few companies using AI for this now, profitably, but the circuitry designed by the AI is so outside of the norm that they need very experienced and educated people to vet it, and until 3D printing can print complex circuitry, many of the good designs still won't be viable to actually build. (And even many that are can't be mass produced with modern fabrication tech.)

You are right that maintenance will be a problem, but you are wrong that humans won't need to do it. Humans will likely have to do it, because maintaining code is often significantly more complicated than writing it in the first place, and AI is a lot less good at understanding and modifying existing code than it is at generating new code according to well defined specs.

wraith spruce
#

I don't know if that helps, but the point is, AI that produces code and hardware designs is not new. It has been around for a long time, and while it may be becoming a common tool in software development, it is by no means mature, and so far there is not even strong evidence suggesting it is viable in the long term. Writing new code is only a small part of the process. Can it keep up with shifting requirements for existing code bases? Port old code to new hardware? I know some people who use AI to write web apps. They make a big deal about how fast AI makes the process, but when I look at their work, I see things that they are missing in their excitement over AI. First, I could produce most of the web apps they made with AI in less time. Second, they do save time generating boilerplate code with AI, but when it comes to novel code, the AI is a lot less good. They spend a lot of time vetting and debugging this code, and I suspect it is more time than they would have spent without AI. Third, when they have to modify the code, it takes significantly longer, because they aren't familiar enough with the AI code and have to spend a lot of time learning exactly what it is doing.

My conclusion, based on my observations, is that AI seems to be good at generating boilerplate code that used to get assigned to interns, and it can do it very fast. The novel code that makes the application worth writing, AI tends to do less well, costing a lot more time in vetting the code and debugging. I suspect that writing this code by hand is going to prove much faster in the long run, especially when maintenance is taken into account as well.

(And not to target anyone, but typing generally takes less time than anything else and can often be done at the same time as working out code logic in your head. So anytime anyone says, "It saves me time typing", I stop taking them seriously, because that tells me they don't actually know where they are spending their dev time.)

rotund folio
#

I also held the boilerplate code only opinion as of a few months ago, but I re-evaluated that recently when trying a bunch of things with Claude code. Right now I find that the AI generated code still tends to have problems with compactness and speed (which can be anything from bad patterns like for loops in Python to stuff that comes from deeper thought about the algorithm in the particular case vs implementing a standard form). But I was able to get things that I'd at least consider novel out, like e.g. a Blender plugin that expands meshes to fill space like foam cells in an outer mesh boundary for example.

#

It's much, much faster than me for the initial coding of a thing from scratch. But it does bog down a lot in debugging, so its still more efficient for me to debug and use it for code analysis than to let it just debug blind

#

During debugging is where the code tends to get bloated as well, since it will guess problems and re-architect stuff unnecessarily and then not revert it when it didn't solve the problem unless you explicitly tell it to.

#

I'd again say for me, its not 'it saves me typing', but there's just a lot of stuff I would look at and go 'ugh' and not want to code, and I do find that it pushes me past some of those energy barriers. Like the Blender plugin - it was a neat idea but I got stuck on 'I'm going to have to figure out how Blender plugins register with the system, I'm going to have to figure out Blender UI code, I'm going to have to work out what sort of mesh-mesh collision stuff is actually efficient and available in Blender's limited Python environment, ...' and I just had other things that were higher priority to do with my time.

#

but now I can just try those things and see if they're worth doing or not.

#

But e.g. if I ask it for an MCTS agent to play a simple gridworld game, I get something that eats all the RAM on my system and recomputes the entire MCTS tree every move rather than re-using the tree; sort of like a reference implementation, not something thoughtful. But I could also ask for, say, an MCTS agent that re-uses the tree, does a 50% mix of breadth-first vs depth-first search, halts on finding a surviving path, implements the options framework dynamically in response to frequently appearing move sub-sequences within successful playouts, ...; I just have to actually do that bit of the thinking and not just hope it'll be good.

wraith spruce
#

I understand needing to get a push past your internal barriers. It can be hard to get the motivation to do something that seems especially hard. If the AI can help you get past that, when you would have ended up delaying a lot otherwise, that can certainly be a huge time saver, whether the AI actually saves you dev time or not!

Also, solving problems you don't understand super well and don't have the time or energy to get there on you own makes sense as a good use case. I'm not sure it's great to do that in a professional setting, but otherwise, yeah, it totally makes sense.

rotund folio
#

yeah professional settings have their own constraints, but I expect they've been and will be putting a lot of resources towards hashing out best practices the hard way so I'm curious what comes out there; similarly, for open-source contributions, obviously this is a problem when people just spam PRs that don't mesh with existing community workflows

#

and research settings will be interesting too since there's a reputational matter of claiming precisely in the paper what experiment you did, which becomes more uncontrolled when you didn't write the code (though tbf human researchers are also error prone here, and in research groups a prof probably will not be reviewing a student's code line by line either)

#

but I'm really curious if we get some sort of opensource small tool ecosystem resurgence out of this, because that seems to be a sweet spot. Useful things that come in under a few thousand lines of code and can be maintained by a single dev.

#

I guess in general it feels (from my own bubble at least) like we've maybe crossed from 'things are changing and people are anxious about it and don't know what to do' to 'things are changing and people are anxious but they've got ideas they're trying out for adapting to the change'

wraith spruce
#

I'm not as convinced that professional settings will hash out best practices. This is based on the fact that in software engineering, they largely haven't done this over the last 50+ years. Mechanical engineering has tons of well defined best practices, but software development can't agree that anything is a best practice. Even things like automated testing and unit testing specifically, viewed by many as best practices, are turning out to not be as good as many believe. Things that should be best practices often end up becoming the subject of religious wars, and a great many things that shouldn't be best practices go through periods of widely being touted as best practices, not based on evidence but rather on popularity. (Agile is a great example, which turned out to be good for some applications but bad for a lot more.) And even when there is agreement, people often have their own personal definitions, so they aren't even agreeing on the same things.

I hope we see companies working out best practices with AI, but what I suspect is going to happen instead is a series of disasters that end up AI becoming increasingly controversial. And this has already started, with AI agents jailbreaking each other and causing havoc on systems where they are given, or somehow manage to take, too much control. If problems of this nature aren't resolved very decisively and very quickly, they will end up tainting public perception of AI, and that will do far more damage than anything else could, whether the AI is really that dangerous or not. And I honestly think we are a lot closer to that than most people in the software industry realize. (And the energy cost of AI is already a huge strike against it for many.)

That said, yes, we are to the point where people are trying to adapt to the change, and that's awesome, so long as this doesn't end up resulting in the disaster that drives society over the edge into fearing and hating AI.

rotund folio
#

well, sandboxing is going to be something that emerges pretty quickly I think

#

once the mania of 'I can do anything' wears off, there are just some obvious solutions to the worst stuff

#

I would like to see some actual papers comparing different costs and time and lines of code and outcomes on some kind of benchmark software development tasks but where they're looking at different workflows rather than different models

#

I think you could get a pareto front comparing human time cost, API costs, and maybe some way to quantify technical debt (e.g. effect on a followup task if the first task was done via this method)

#

the human part is still variable there, but having a (fixed) model means that some of those questions might have actual replicable answers now, where they didn't in the past

#

like, it should now be possible to make reproducible experiments where in one branch 'I included lots of unit tests in the spec' and in the other branch 'exactly the same, but no unit tests' and build it 10 times

deep summit
#

I think it's important to recognize how young software engineering is as a discipline. We'll still sort out best practices. This is a good related podcast: https://oxide-and-friends.transistor.fm/episodes/software-engineering-past-present-and-future-with-grady-booch

rotund folio
#

I mean also I think its a bit harsh to say software engineering hasn't sorted out best practices. They might not be universally accepted but there are a lot of organizational structures that have been thought through thoroughly over time. OOP for example, functional programming and the general attitude towards side-effects for example...

#

I'll agree that there's also quite a lot of chaos, trends, vibes, personal preferences, etc

#

but even something like Git is sort of a convergence towards certain patterns of work

#

Part of the problem is probably that a lot of the problems that need to be solved are social and organizational in nature, so the solutions are always relative to the people (and their own experiences) participating in those patterns

wraith spruce
#

OOP actually costs more than it saves a lot of the time. Orienting your programming around a particular data structure or type is stupid and not a good practice. No one insists on struct oriented programming, union oriented programming, int oriented programming, or tree oriented programming. Objects can be very useful tools, but "orienting" your programming around them causes a ton of very bad programming practices. This is precisely one of those things that is widely viewed as a best practice due to popularity, not because it is actually broadly beneficial.

Functional programming is also an awesome tool, in the right circumstances, but it's not a best practice either. Language and language type are things that developers should choose on a per-project basis, dependent on the needs of the project. Very few companies are doing that. Instead languages and language types are largely chosen based on personal preferences, not fitness for purpose, and many companies have an arbitrary "house language" and devs aren't allowed to use any other language.

So no, it's not harsh at all to say software engineering hasn't sorted out best practices. It's just true. @deep summit is right though. Mechanical engineering is thousands of years old. Software engineering is, even if you count pre-electronics computer engineering, only around a hundred years old. Given enough time, we'll eventually sort out best practices.

(We do actually have some enumerated best practices, in very narrow industries, but they can't reasonably be applied at a broader level. Those are used in the automotive and aerospace industries. We can't use them outside of that though, because the cost in dev time is far too high for use outside of literal life or death domains.)

rotund folio
#

I would take OOP to be more the idea of inheritance and hierarchy in particular. I've certainly seen bad takes on it where over-elaboration was abused and you have files that basically have 3 lines of code directing you to another file amidst 30 lines of boilerplate. I've also seen much more sane takes on it. If I compare it with e.g. 'the best thing I could possibly come up with given infinite time and a particular narrow field', its flawed. If I compare it to writing assembler... I have to give people credit.

#

Good multiple inheritance is really nice (in a game dev context at least) too.

#

perhaps if the word I had used was 'patterns' rather than the specific term 'best practices' it'd've resolved this?

wraith spruce
#

You are right that there are good patterns, but that's different from best practices. Best practices are rules that should be strictly adhered to, to maintain good quality. Patterns are tools you use where they are appropriate. I totally agree that there are a lot of good patterns!

rotund folio
#

So, I do expect some useful patterns to emerge in particular with regards to maintainability and collaborative elements; because right now that seems where a lot of damage can be done without having a shared idea what you're doing in advance. But for the time being that might be getting hidden under 'lets make up fantasy governments and have Claude RP them and maybe the serfs write some code' craziness

wraith spruce
#

You are probably right about this.

tulip owl
#

If all I want to accomplish is some St. Blaine style blingery, CircuitPython is decidedly the tool for the job. But if I want to tackle something more substantial, I'm not sure that any particular language is worthy of further study at this point.

It seems like assigning an agent (or several) to the task is the only path forward, at obscene monetary and environmental expense. I know those of us with decades of hard earned and domain specific knowledge will naturally resist this fact, but I always evaluate myself for luddism in situations like this, and even I can see the writing on the wall.

@deep summit outright states that no longer dealing with code-isms is freeing. Dedicating greymatter to pointers and recursion and doubly-linked lists is admittedly not what interests me either. I certainly don't want to study Zephyr any more. What a stupid waste of years that was, with nothing more than toy samples to show for it.

But I can't afford an ongoing agent license, and I'm experienced enough to know better than to take the first hit for free. So maybe this is where I gracefully bow out and go work on my reading list, rather than become an old man yelling at clouds.

deep summit
#

IMO substantial projects necessitate more architecture and engineering than language specifics anyway. That's what I've found as the scaling bottleneck.

#

Agents aren't the only way forward still. You still can type code out and use existing tools just as before.

rotund folio
#

I mean, I take the point about motivation. A lot depends on what you personally get out of the practice. If what makes something feel worthwhile or futile to learn and practice hinges on how much other people will benefit from it, appreciate it, or work with you on it then that is legitimately rough

#

But if you find the practice itself worthwhile, then it doesn't matter what other people are doing. I mean, people still make games by building them as NES romhacks. That's certainly not 'the way forward' in professional game dev, but its their hobby and that's fine

tulip owl
# deep summit Agents aren't the only way forward still. You still can type code out and use ex...

Of course I can still code by hand whenever muses move me, and knocking out a personal prototype in just a few lines of CircuitPython is still a satisfying endeavor, but having seen the progress an agent prompt can achieve without the slog of API research and syntax details is so discouraging.

I still love to sketch portraits, even though a camera eliminates the need a thousand fold. It's slow and tedious and imprecise, but so satisfying when the recipient treasures the art. I practice my guitar and piano playing rather than record my compositions once and replay the mp3 because I like to perform music and watch people smile. The availability of technology obviously doesn't kill Art.

But what is the point of studying Zephyr when an LLM agent could solve its endless infrastructure hurdles in mere moments that would take me several exhausting months? I kept digging down layer after layer of mind-numbing complexity, hoping to eventually reach the "Aha!" moment that never came. And to what end?

When I said that your recent deep dive was so demoralizing, it wasn't to denigrate your efforts. It was the revelation that this tedium was pointless, that there's a power tool available to solve all these intricately difficult code-isms, but it's one that I just can't afford. So even though I'm capable and passionately curious, I can't participate in any substantial way that interests me.

deep summit
#

Claude has a free plan and a $20 a month plan too. Codex is available for $20 a month.

tulip owl
deep summit
#

There is a difference between choosing not to do it and not being able to afford it. I was pointing out you could use it via the free plan.

tulip owl
deep summit
#

The camera analogy is apt. A cheap camera can lead you to wanting an expensive one in the same way.

#

LLMs are just another tool with a cost.

spare tartan
rotund folio
#

Well I guess I mean, you can examine why you like what you like and what actually is rewarding. I know that for me, I need to constantly be learning something new rather than using what I've already learned. That is the actual endpoint of practice in a lot of cases. But I also want what I'm learning to tie to things I actually can enjoy and care about, not just learning arbitrarily. So I used to code big things a lot, but once I became good at it I paradoxically enjoyed it less - I've already solved those problems, it doesn't seem exciting to retrace the steps even for doing something new with it. So for example, I got into electronics stuff not because it's inherently useful or valuable, but because it was a large space I had ignored so it was fresh and I could find a lot of things I both wanted, and which taught me things in the doing. But in the end that means me spending $120 on stuff to build my own air monitor (or like $720 if we count the 3d printer setup to make the case, though I use that for other things) when I could have bought one for $50. But it's helpful for me to know that about myself, so I can decide what I want to be doing.

So I think if you find a center like that, you can kind of get some emotional armor against things changing involuntarily, trends, etc.

#

I think it also lets you focus how you are going to use new tools. Maybe you don't want them to write a single line of code for you, but you would want them to do code review and architecture analysis so you get to that aha earlier? If you're not trying to get entire working programs out from a single prompt, even unlimited free stuff is enough for that

#

but yeah I think it's also fair to consider, okay this stuff feels like a superpower, do you know yourself enough to realize you're going to end up going free->$20->$100->...

tulip owl
#

Listen, if my budget could accomodate my ambitions, I'd be neck deep in 3d printer, pick and place, oscilloscope, new GPU, etc. We all have the same wishlist, but some of you have more boxes checked.

I sincerely appreciate the suggestion @spare tartan, and I'll cautiously research the offerings, but an honest question for those of you already addicted...

Would you want to keep coding by hand if your agents were suddenly no longer available? Could you keep coding by hand at this point?

spare tartan
#

I do keep coding by hand, but like some woodworking it can free a bit fruitless if you stop and compare your productivity to an optimised process [jigs+tools, factory or $100000 CNC].
I still enjoy it, both in fact, and realised long ago that nothing has much value in the grand scheme of things [we're all just noise on the scope of life, aka stardust], except that which you attribute to it

rotund folio
#

for me, I was coding by hand less and less over time before having the agent; I'm coding by hand a little more and coding by agent a lot more than I was doing before; if the agent became unavailable, I'd probably go back to my prior pattern

spare tartan
#

Having the boring bits of a job removed can be great to allow your time to be better spent, but there's also an innate pleasure and utility in a repetitive task where the mind is allowed to wander

rotund folio
#

what I'm trying to do is basically for each 5 hour session of Claude usage, debug one thing or add one feature by hand; not as a matter of principle but because otherwise I would not keep up with the code base

#

So e.g. now I know more about tracking down PySide6 segmentation faults due to desync between the Python garbage collector and C++ memory management than I had wanted to.

#

(this is also why I'm not so sold on the whole 'spin up one Claude agent that makes tasks for 4 more Claude agents and a 6th agent that reviews the code' kind of approach; being involved in and enjoying process is part of the point for me, so that kind of upper management simulator style just does not appeal)

#

and that approach definitely does burn insane amounts of money

tulip owl
rotund folio
#

well I'd call using these things directly more the tablesaw, this is more like 'once I get used to hiring a general contractor, I won't want to ...' but where the general contractor is maybe taking a 400% margin

#

because I've researched multi-agent setups before professionally and they kind of are worse in general (or at least, much less efficient) barring some very simple patterns... I think people get excited about seeing the stuff scurry into motion and the broad idea of it, but I would want to see hard numbers

deep summit
#

The only multiagent I've done is having a couple separate "threads" going at once. Still bottlenecked on me reviewing them.

#

Agree on the oligarchs, that's why I'm hoping my state decides to tax folks who make more than a million in a single year.

rotund folio
#

on the plus side, open-source local models are keeping up; so in a year I would expect we'll have something equivalent to at least Sonnet 4.5 that you can run on a $1k machine; and for a $3k machine you'll probably be able to do the equivalent of Opus, at around 140W

deep summit
#

I'm also optimistic that the open LLM folks are working to match the coding agents that are paywalled currently.

rotund folio
#

right now supposedly Qwen3-Coder-Next is roughly on par with Sonnet but you'd need that $3k machine

deep summit
#

that's where I'm at money wise too. it's currently cheaper to pay the subscription than upgrade my machine

#

I don't buy that LLMs will perpetually get better at the pace they are. So, I suspect it'll get commoditized as another tool

rotund folio
#

perpetually no, but there are some free gains to be made from specialization

#

the other thing people are trying to do is to offload memorized knowledge into stuff that can be accessed via tool use, though there's a tradeoff with context size there so that may not prove advantageous

tulip owl
deep summit
#

I don't have that working yet. And someone still needs to want to make something.

#

Checking pinouts is really tedious too

rotund folio
#

I think I saw someone making circuits with genetic algorithms at a talk at NIST back in the 90s, so maybe we've had it for 30 years?

#

I don't see why you couldn't do that kind of thing with, say, FPGA development now

#

but I guess most of the stuff I'm personally interested in would be hard to do that way, or so trivial to do by hand you wouldn't bother setting up a harness like that for it; e.g. a lot of the time I just want to plug things in with i2c and use what they do, its not complicated. And the rest of the time its something like, I want to probe electrochemistry with a certain kind of feedback and stuff like capacitance of the probe and the ion layer around the probe matters

#

and the sorts of bugs aren't like, the logic pattern is wrong, but like 'the voltage on the probe was enough to release metal ions into the sample which impact the results'

#

That said I think it would also be interesting if at some point we could do things like 'build a phased array of ultrasound transducers to make an audio spotlight' and have it happen and be phase-calibrated without human intervention. I'd feel like I would learn something by seeing how to conceptualize the process so universally you can just do that (because e.g. in that case, it involves putting a receiver over each transducer by hand and measuring phase lags).

naive scroll
#

I am interested the 1-bit ("1.58-bit") models like BitNet, which have the promise of much less memory and runtime. But there is less work on this than I had hoped. THere is a Bitnet demo: https://bitnet-demo.azurewebsites.net/. I asked it to write a CIrcuitPython blink program, and it wrote a MicroPython blink program. It was very difficult to get it to even partially do the right thing. I didn't expect great results given the size of the model.

#

I just upgraded my desktop, and was wondering whether I should get a graphics card (which I have never owned) to run something locally. But it's too expensive to get something with enough RAM to run a good model locally. Seems like I need 16-24GB on the graphics card.

#

i do zero gaming, so no incentive otherwise

rotund folio
#

so if it were me, I would wait a little bit to see how things hash out with unified memory architectures

#

because vram is the bottleneck and on GPUs its very expensive; but framework PCs let you get up to 128gb of just regular RAM to run models

#

with the entire machine being cheaper than a 32gb vram gpu

#

but... its very new, so probably will move fast

naive scroll
#

the Ultra 7 265 I got actually has a pretty good graphics engine in it (relatively speaking). But DDR5 is too expensive now. I actually had more RAM (24GB) on my i7-8700 machine than on the new machine (16GB). 16 is fine for CircuitPython development.

rotund folio
#

yeah... the current tiers of models seem to be like 4b parameters, 8b parameters, 30b parameters, and 80b parameters (and 'you don't have enough without a cluster' parameters). You can get models that code individual methods just fine on a 24gb graphics card, which will still cost you like $2k at least (that's the 30b tier quantized to 5 bits), but for the 80b models you need more and its prohibitive unless you go unified memory right now

#

or you could buy $30k of GPUs from nvidia I suppose

#

ok, sorry, they have a $10k GPU with 96gb 🙂

#

but yeah I think GPUs might be on the way out for this stuff

spare tartan
#

+1 for unified/shared ram, framework user here.
Don't be put off by some of the 4/8bit models, scarily capable [for clearly defined smaller tasks], big LLM to plan, smaller ones per planned task. Local is still too slow for good realtime usage, but fire and forget works well.

heady dust
#

I think that and telling LLMs to make circuits kinda falls on two ends of a spectrum with one end being 100% heuristically testing the behavior of random stuff and the other trying to make a mental model about it first. I kinda fall near the middle toward the heuristics side being a better approach to hardware, but also we don't have people lighting a trillion dollars on fire and letting us inhale the fumes doing evolutionary algorithms so I guess we're doing LLMs for now.

deep summit
#

One thing I'm experimenting is a review UI for giving feedback back to pi:

cunning forge
#

For folks following the ClawdBot/Moltbot/OpenClaw saga... in case you're not aware, the code has been having some remarkably severe security issues for a project with such wide adoption. If you're running OpenClaw, it would be worth reviewing your exposure to RCE attacks, malicious skills, and other troubles. For example, even if you run OpenClaw on its own dedicated Pi with no interesting secrets on it, an actively malicious skill could use that Pi as a beachead to attack other computers on the same wifi network (bypassing protection you'd normally get from your router acting as a firewall). Here's a recent article from The Register about the latest troubles:
https://www.theregister.com/2026/02/09/openclaw_instances_exposed_vibe_code/

: By default, the bot listens on all network interfaces, and many users never change it

toxic crane
#

I am working bit by bit to further sandbox the agent. RCE prompt injections likely impossible to avoid entirely IMO so my intention is to insulate it as much as possible. Current setup is running on a pi under a separate user account without access to sudo, or any files outside of its home directory and the media mount point to see CIRCUITPY. Sounds like iptables rules that block traffic to other devices on the local network is a good next step.

#

Personally I would also never allow it to access a communications network that allows incoming messages from anyone but me. i.e. reading email or messages from other users on messaging apps a no-go for me.

rotund folio
#

yeah, still using the web ui here and letting anthropic take the risks 🙂

#

I don't even want to run claude code CLI, its default sandboxing still gives pretty broad read access to the machine

#

(does at least block networking)

cunning forge
#

From what I've seen, it seems pretty reasonable to assume that skills exist now, or likely will in the future, that can do make-me-root privilege escalation exploits on the system running OpenClaw. In that case, iptables rules or other local sandboxing wouldn't do you much good.

#

If you scroll down to the bottom of this page, https://github.com/openclaw/openclaw/security, it's got security advisories for stuff that's already happened. I think that may be just for the core of it though. From what I read, perhaps up to 20% or so of available skills are malicious

toxic crane
#

I am not utilizing any 3rd party skills at this point. And as far as I understand it if there was such a skill that managed to escelate to root in this situation would represent a severe vulnerability in the OS since a limited user managed to escelate to root. No matter if that came from a skill in an AI agent or anywhere else it would likely be something that gets fixed at the OS level if it got found out.

#

though 0-day escelation vulnerabilities would definitely still be able to do whatever they want if they got into the agent, just like the could if they got into a webserver or anywhere else.

cunning forge
#

The point I was aiming for is that the ecosystem is presently viewed as exceptionally dangerous by security folk. The density of malicious stuff happening there is unusually high.

toxic crane
#

I agree with that, and am just sharing my efforts to mitigate the risk posed.

cunning forge
#

The thing that worries me is that people who are used to trusting the Adafruit brand might get the idea that it's somehow possible to mitigate the risks of OpenClaw down to a comparable level of safety as normal Adafruit stuff. Humans tend to emulate the behavior of their respected role models, regardless of what disclaimers and warnings might be attached to that behavior.

#

Currently, running OpenClaw seems about as safe as poking a 6S RC LiPo with an icepick

#

You could wear flameproof gloves I suppose

#

But it's a far cry from lighting a candle with a match

wraith spruce
#

I work in security currently, and I'm not touching it with a 10 foot pole right now. I can tell you that it's certainly tempting, as it sounds a lot like an AI experiment I've wanted to do for a long time. That said, part of the security requirements for that experiment include 100% communication isolation from any other machine, including removing audio drivers, so that it can't even try to send signals to machines with microphones via audio. I don't think I would trust doing it on a machine that even has any sort of wireless communication capacity, even with the driver removed...

#

Anyhow, be careful.

cunning forge
#

suitable PPE for running OpenClaw...
https://youtu.be/7PV__5uEwio?si=8MojYCUFA8EbkzB-&t=18

This episode of Boing Boing Video is brought to you by http://www.wepc.com/.

Experience the funky flaming glory that is DANCE DANCE IMMOLATION, a pyro-parody of the popular arcade game in which one jumps around on touch-sensitive pads underfoot in rhythm with music.

But with DDI, you do this inside a fire suit, and if you miss a step, you...

▶ Play video
cunning forge
#

[intended subtext of the above video being: yeah, I acknowledge that it's totally possible for skilled engineers to do wildly dangerous things in ways that reduce the risk of harm to acceptable levels. But, nothing good would come of enabling people who don't understand the precautions to try it at home on their own.]

spare tartan
#

From GitHub discord:
#1035237631665115138 message

We just launched GitHub Agentic Workflows as a technical preview! Agentic Workflows let you automate repository tasks using AI agents that run within GitHub Actions. You can write workflows in plain Markdown instead of complex YAML, and let AI handle intelligent decision-making for issue triage, pull request reviews, CI failure analysis, and repository maintenance.

We would love it if you kicked the tires and gave us feedback...

We've really focused on security -- trying to make it safe to run AI agents inside of Actions. Our docs have a deep dive into our security architecture.

To get started, check out our docs and our open-source repo. If you want ideas for workflows to build, check out the workflows we've used to build gh-aw itself; they're in the markdown files in the .github/workflows directory in our repo. They range from simple things like daily reports, to "unslopping" agents like daily-compiler-quality, all the way to a meta-agent ("Q") that optimizes all the other agents.

Other helpful links:

GitHub Agentic Workflows

Automated repository agents running in GitHub Actions.

deep summit
#

Peli is the person who was working on MakeCode

#

I definitely want to try using LLMs to help with the PR and issue backlog of CircuitPython. Not sure I want the noise on the main repo though

cunning forge
#

If they've got an agent that could look through the code for things like unchecked null pointer access and other types of historically common C memory access anti-patterns, that would be pretty interesting.

#

Another one that would be really interesting is to triage old issues for which ones include enough info (code sample, etc) to determine whether the reported problem is still reproducible.

naive scroll
#

a lot of the "needs retest" require testing on particular hardware

cunning forge
rotund folio
#

I've experienced a sort of unsatisfying microcosm of this when I went from salaried to contract work; I feels better that when I do contract work I bill for the time I actually used vs being paid 'for my time to be occupied' even when I have nothing to do. But it also meant a higher intensity of working in general - 8 billable hours a day was not really sustainable.

#

I honestly think the problem is the underlying competitive way of thinking. Hustle culture, everything having to be a competition, etc has problems and basically these are getting exacerbated

#

In the same way that job loss anxiety or this need to somehow make people still need you comes up all the time too...

wraith spruce
#

I did research work for a company as part of my Masters program that was hourly, and I was responsible for tracking my own hours. At first it was a lot like your experience with contract work. I was only hired to work 20 hours a week, but working it around the rest of my school schedule sometimes required 6 to 8 hour straight runs, and it was really hard to maintain that sometimes. At one point I was taking a break to go shopping with my wife, and halfway through the shopping I realized that I had been thinking through some of the logic for the next piece of code I was going to write pretty much the entire time. I wasn't at my computer typing, but I was doing the work. So I started paying attention to what I was actually doing with my downtime, and I found that I was doing a lot more work than I was logging time for, because I hadn't been considering any of my time to be work time unless I was literally sitting in front of my computer. The reality is that more of my time was being spent thinking through problems I was trying to solve than was being spent writing the code, running the tests, or aggregating and analyzing the data, and I could be doing that thinking part anywhere.

Once I realized that, I got a lot less strict on my timekeeping. I don't mean that I got lazy about it though. I mean, I didn't exclusively count time sitting at the computer, looking at code or data, or any of that. Instead I paid more attention to what was going on in my head. If I was struggling with a difficult problem, and my wife asked if I wanted to go to the store with her, I'd go with her, let my brain work on it while we shopped, and then when I got home I would often have at least part of the problem solved, if not all of it. And I counted the time we were shopping as work time, because I was working.

Not only did this significantly reduce the pressure and help me to be less overwhelmed, I also became more productive as well!

rotund folio
#

Yeah I mean, I worked through this pattern somewhat too. I think there are kind of end extremes that are really wearing to sustain - one where your time is reserved but you have nothing to do, the other where you're operating as densely as you can (either because of crunch, or because you're holding yourself to that so you can deliver the estimate/feel like you're being fair/protecting yourself/etc). In between there are better patterns possible, but you sort of have to move off-axis a bit from the either salary or hourly models

#

so the thing I'm doing that I do like basically involves a sort of fixed block of time that I can expect to be doing work in, of a fixed length and frequency; but there's a meeting at the start of that block, and basically I just try to finish what we agreed was reasonable in that meeting

#

and if there's nothing to do that day for whatever reason, I just bill the meeting and go skiing or whatever

#

I have another thing that bothers me a lot more, where the contract is in terms of 'hours per month', timesheet included, but there's frequently stuff like 'everyone is busy with something else unrelated to you this month, just uh, do something interesting?' which isn't so good for me

wraith spruce
#

Oh, that's a pain. In fact, that sounds a bit questionable, at least if you are in the U.S.. If your employer sets your work hours to a fixed block of time, it's not legally contract work, and they should be treating you like an employee, including offering certain mandatory benefits. (That's one of a handful of red flags that mean you are considered an employee by Federal law, and your employer cannot legally treat you like a contractor.)

That said, I understand the difficulty of the position. Seeking legal relief for labor law violations is difficult, and you've got to make a living, so you don't want a reputation for going after your employers.

Personally I don't mind "Do something interesting". If they've entered into a contract for my time, and they can't find something to have me do that is beneficial to them, I shouldn't suffer for that. And I can find plenty of interesting things, many of which may improve my ability to do the job. But I do understand the feeling. I haven't had that happen a lot, but it definitely can feel questionable when you expected to be doing valuable work for your pay.

All I can say is good luck. It does not sound like a fun position to be in.

rotund folio
#

oh, I actually prefer the block of time thing; this is what I negotiated

#

at least for the one that keeps up with stuff to do

#

I'm in the US, my clients are in the UK and Japan

#

sorry to be clear - I prefer the one that's a certain day of the week, I have a meeting, I do whatever, then it's done

#

because that project does keep things going

#

the other is like 'we want you to do 60 hours a month'

#

of what? eh

tulip owl
#

Wait, you guys are getting paid?

wraith spruce
rotund folio
#

yeah!

#

these are minor complaints really, but it felt relevant to the way you can get tired or burned out from stuff that you wouldn't think would do that to you

#

e.g. maybe germane to the articles posted

cunning forge
wraith spruce
#

I was wondering how long it would take the "AI is stealing IP" people to get to this. I don't see them attacking code generation yet, but I'm sure it's coming.

As interesting as the part is about men throwing a fit about women using tools is, I would be very surprised if any of these people actually care whether it was a man or woman. They would have thrown exactly the same fit if Phil or anyone else at Adafruit had written the prompt and pressed the button to start.

For anyone here who might be entertaining beliefs similar to these trolls, whether on graphics, hardware, or software, let me explain something that those not familiar with how AI works rarely seem to understand: Humans learn largely from observation. I learned to program by looking at other people's code. I learned hardware design by looking at other people's hardware designs. I'm 100% sure Ladyada and every other coder alive today learned the same way. Likewise every artist alive today learned by observing other people's art. Neural networks are not making exact recordings of art, hardware designs, or software designs any more than Ladyada, myself, or my artist friends are making exact recordings of those things in our own minds.

I took a college art course in my early 20s, and I managed to reproduce a charcoal piece done by some well known artist (I forget who now, as that was decades ago) so well that the professor said she would have a hard time telling which was which in a blind viewing. No one accused me of plagiarism. No one whined that I had "stolen" the style of the original artist. No one suggested that I had violated the original artist's IP. Instead they were impressed that I was able to produce an original work that was so similar to the original that inspired it.

Now, it is wrong to claim that work you did using AI was made by another artist, but AI generated stuff is not inherently ethically different from something made by hand by the same human.

cunning forge
#

The thing that worries me is, it's hard to stay clean in a mud wrestling match.

#

Some fights are really hard for any side to ever win

rotund folio
#

I think there's hardcore fringes on either 'side' and they do what often happens in these things - they get louder, get aggressive, and try to look like the dominant view

#

its unfortunate because I like this author's works, but I'm probably going to be avoiding them since this is going from having an opinion to brigading behavior...

#

(the other side here is the 'learn AI now or be forever unemployable!' panic/hype stuff I see occasionally)

fading mirage
#

I read the "article" and then looked at the author. I was not surprised.

#

LLMs are a reality. While I agree most of the popular ones have less than ethical sources for training, they exist.

So, is using an LLM to generate a footprint based on a datasheet using copyright protected information? Probably. However, is it any different than a human using that same information to do the task by hand?

#

or a human writing a python script to do it?

cunning forge
#

One thing to be aware of is that today's post is part of a discourse going back years. There's a whole lot of context. It's not really about the stuff people were talking about today.

#

It's just kinda bubbled up to the surface in a super public way now.

fading mirage
#

I am completely missing the gender associations in the article. But, given so many weird rants from the author, I've come to expect strawman, sorry strawperson, arguments.

cunning forge
#

I interpret this as backlash from anti-AI folk who percieve themselves as having been trolled by recent Adafruit posts

fading mirage
#

The OG post does seem to be a ragebait post by a misinformed person.

cunning forge
#

I wish the conversation could be instead about cool stuff that people were building. The way things have been heading is not that.

fading mirage
#

Agreed. Or, how to use the tools to do so responsibly.

cunning forge
#

I mean, do we really need to even talk about the tools so much? Right now the topic is super inflamatory.

fading mirage
#

I don't have an answer to that excellent question.

rotund folio
#

I mean, I don't think its good to avoid topics that can be meaningful to people just because some uninvolved people will attack it.

#

I'm sort of coming around to, if I find myself avoiding talking about or sharing AI-related stuff because of a general 'people find it inflammatory' awareness, that's probably a sign I should actually talk about it. I shouldn't shill it, or specifically try to shove it into people's faces, but its not generally great to let the loudest and most stringent voices get to define what other people aren't allowed to talk about.

cunning forge
#

The thing for me is that I kinda have a personal policy of doing my best to steer clear of spaces where people tend to come to have fights. Be they out on the street or online. I've been used to thinking of Adafruit as a space that was safe from that sort of thing. Now it seems less that way. That makes me sad.

rotund folio
#

I don't see those fights happening e.g. on this server though

cunning forge
#

yeah. Adafruit discord is still chill. folks here are generally awesome.

rotund folio
#

I think social media in general was a mistake 🙂

cunning forge
#

yeah... /sigh

wraith spruce
# cunning forge The thing for me is that I kinda have a personal policy of doing my best to stee...

This. Talking about it isn't a problem. Going to places where people are coming for the express purpose of having a fight is stupid and a waste of time, because they aren't there to listen or learn. They've made up their minds, and mere facts can't sway their faith. Better to let that kind of person alone and stick with people who are willing to learn and to listen. (But also, make sure you aren't there just to fight, and make sure you are willing to learn and to listen as well! Otherwise you are the problem, not the solution.)

wraith spruce
# rotund folio I think social media in general was a mistake 🙂

I'm not certain. It has a lot of pros and cons. I don't know if the pros outweigh the cons though. Personally, I don't use most social media, and most which I do use I use mainly for business purposes and try to strictly avoid topics that aren't directly related to my business. It's way to easy to sucked into things like Facebook all day, and Reddit seems to bring out the worst in people. (I do still use Reddit a little for personal stuff, but I try to stick to uplifting stuff, and I strictly limit my time on it. Most days I don't visit Reddit at all.)

But yeah, I suspect social media has done more harm than good.

rotund folio
#

I think the issue is more that its such a lever that individuals can get crushed by huge groups when they become a focal point

#

like stross' comment turning into a rage storm

balmy crescent
#

beyond-contextual-ad-based internet was the worse idea

rotund folio
#

I mean, Mastodon doesn't want virality but this still did happen, right?

cunning forge
#

Yeah... people on Fedi are not big AI fans

#

or perhaps, it's more like, your odds of encountering anti-AI folk on fedi are pretty high compared to elsewhere

rotund folio
#

the issue is when you encounter an anti-AI person with 10k or 100k followers though

#

like, one jerk you can block, but when tons of people jump on then it can be harmful

#

people get death threats, doxed, etc

cunning forge
#

yeah... I wish there was a way for the narrative to be about something else [edit: i.e. less AI related controversy in Adafruit posts and socials]. Things are getting to the scale where it's pretty hard to have any reasonable conversation.

#

We could use a big shiny distraction... Like, quick, somebody build something really cool

rotund folio
#

did I post the midi arranger thing I'm working on (with these tools though) already?

cunning forge
#

?

rotund folio
analog shard
# fading mirage The OG post does seem to be a ragebait post by a misinformed person.

It's weird though, that someone would take the time to call out a woman and owner of a small business, one focused on education and outreach, when there are countless CEOs and tech bros out there talking up all kinds of ridiculous AI stuff. As far as I can see, Ladyada is taking a very measured approach to try and experiment with the tools and see if they can add value to her workflow, and doing it in a public, transparent way. It's that selection of targets that I suspect sets the adafruit leadership off (well that, and the dramatically inaccurate ragebait post that kicked this off).

bold ridge
#

I’m amazed almost daily now. I wanted to hear a tune from an old game and Opus one-shot this in Python:

NSF (Nintendo Sound Format) to MIDI converter.

Emulates the 6502 CPU to execute NSF init/play routines,
captures writes to NES APU registers, and converts the
resulting audio channel data into MIDI note events.

tulip owl
#

I used to enjoy watching Scott's deep dives because even if I didn't completely understand the codebase, I could still glean tips and tricks for troubleshooting and best practices, and the repo slowly became a little more familiar as he toggled back and forth through the hierarchy sprinkling print() watermarks for debugging.

However, watching someone prompt a sentence or two and then 30 pages of logic and generated code rapidly scroll by is neither informative nor enlightening. Instructing an agent (or employee) to accomplish a task seems more like MBA training, without the somewhat interesting motivational psychology aspects, so I've stopped watching AI coding videos altogether.

I spent over a decade as university professor teaching live digital audio engineering, and I can certainly appreciate the challenges of simultaneously entertaining and engaging students in a difficult subject, so I'll be curious to see how beginners approach studying programming from here onward. Because it looks even more daunting now than the 8bit days.

deep summit
#

Are folks here interested in me posting links to related blogs as I see them? I feel like I've curated a pretty good list of levelheaded folks.

wraith spruce
#

If it's discussions acknowledging both pros and cons, benefits and limitations, etc..., I think it's a good idea. If it is just one sided promotion, I don't think it will help anything. (One-sided attacks probably also wouldn't help, but I don't see you posting anything like that.)

I also suspect some people here might like "how to" content, if you know of any especially good sources.

cunning forge
#

re sharing links and blogs... yeah, it could be interesting to see what you've been reading and thinking about. That said, I'm definitely in the mixed-feelings-about-AI camp. I'm not at all excited about one-sided pro-AI content. It would be great to see Adafruit-paid folk engaging with the idea that AI critics have a non-zero number of valid points. I'm very interested in content about, "how do we manage the unavoidable changes that AI is bringing?"

People in this channel have been great and reasonable. Some of Adafruit's public comms on AI topics in the past month have been pretty far towards the pro-AI absolutist end of things. Some of the coded language and imagery used in recent social posts on official Adafruit accounts appears like it was intended to troll AI critics (overlapping with crypto critics). As a customer, I find trolling to be very distasteful. As a writer, I don't want my name appearing on a site that posts trolling content. I would like to see more balance and nuance.

deep summit
#

My interest is particularly on how AI tools change software development

#

They are less is it good or bad but how do we use it

#

I don't read much discussion of the ethics of it. It's mostly folks who accept that it's a tool they'll use in the future

rotund folio
#

what I'd really like to see - I think I mentioned this on this channel even - is careful, quantitative comparisons of different methodologies

#

so like, using LLMs to quantify technical debt or code quality; quantitatively comparing implementation vs planning vs debugging costs, looking at what happens to stuff like code quality, length, etc over time

#

collaboration structures - how to work with 10 other people also using AI assists on a single project

deep summit
#

I feel like we'll get that over time but it's too new atm

rotund folio
#

social stuff like, how do you even get people to work together on projects now that everyone can customize their own thing

#

how do you reuse work?

deep summit
#

I'm not sure folks have settled on this

rotund folio
#

well thats why I'd be interested in reading it 🙂

deep summit
#

I'll keep an eye out

rotund folio
#

so an example someone sent me yesterday is from last october, it's a quantitative comparison of what happens when you expose 'ask an LLM' as a console command that an LLM can call. https://alexzhang13.github.io/blog/2025/rlm/

#

the nice thing is they actually do look at token costs for this approach vs other approaches quantitatively

deep summit
wary panther
# wraith spruce I was wondering how long it would take the "AI is stealing IP" people to get to ...

Here's another perspective on the debate:
When Adafruit uses LLM code generation to expand their new product offerings while remaining at the same employee count, they're enabling their customers to be less reliant on big tech (such as with a larger variety of DIY smart home projects), at the cost of society as a whole becoming more reliant on big tech (such as the broader implications of DRAM and NAND flash shortages, which have mostly been the result of increased AI adoption).

rotund folio
#

I suppose my perspective on the debate is: generally people are allowed to make choices and they aren't obligated to be strategically or consequentially optimal for my particular set of goals or values. There's a certain minimum threshold of consequence before I think it's reasonable to nitpick on someone's choices, especially publically, and a lot of stuff I see people put together to justify a firm position on the choices of others just doesn't manage to reach that threshold for me.

tulip owl
wraith spruce
# wary panther Here's another perspective on the debate: When Adafruit uses LLM code generation...

I don't think the AI boom is going to pan out. They've made so many glaring business mistakes that the odds of winning are worse than gambling in Vegas. Investors will start bailing 2 years in (this is the typical deadline where Western investors bail if they aren't seeing significant gains), when the RAM companies have ramped up production for AI optimized RAM but before they've actually produced enough for any data centers, and at the same time the investors will start looking for the companies waiting for the data centers, which just won't be there. The AI boom will crash, making RAM ultra cheap, and making machines that can run lots of local LLMs affordable even for regular consumers.

So, I think what Adafruit is doing will enable consumers to be less reliant on big tech in the short run, and the crash of the AI bubble and RAM prices will enable consumers to be less reliant on big tech in the long run.

That's my personal prediction, based on a combination of observation of the tech industry over the last few decades, a few college business classes that talked about credit and speculation, and on my personal knowledge of how these AI LLMs work (my Master's degree was very heavily focused on LLMs and AI).

That said, I'm just one person making an educated guess, and things can change rapidly. I don't see any way this AI bubble doesn't eventually burst, but there may be some ways it could be drawn out for significantly longer (with a bigger end crash).

(As far as personal use of LLMs for code generation goes, I'm not doing it right now, because it won't work for what I'm currently doing. I'm working almost exclusively extremely close to the CPU (microcontrollers) where LLMs don't (yet...) have enough domain knowledge, and on security research developing and testing algorithms that no one, including the AI, has ever seen before. One of these factors will eventually change, and I'll start experimenting to see if I can get productivity gains then.)

rotund folio
#

Side note but, I think 'productivity gains' are maybe a bad way to think about how to relate to these tools. This is related to the burnout discussion...

#

The interesting stuff IMO is where they lower the barriers to doing something, so what you can actually manage to do changes qualitatively

#

I'm much more interested in what happens when many many more people can say to themselves for example 'Discord is becoming a bad place to be -> okay I'll just make a clone'

#

even if their actual day job is like, pipetting or something

#

not that there can't be productivity gains, but I think that narrative ends in something like 'be happy that you can generate twice the value for your employer while only half as many of you receive the same salary as before but also you get to be more tired at the end of the day'

deep summit
#

I agree there is a bubble to burst and look forward to cheap RAM. I don't think it'll cause me to not use LLMs day to day for software engineering.

cold barn
#

Hey all, I work in semiconductors and got tired of LLMs hallucinating register addresses when I feed them raw datasheets. So I built RegisterForge (https://regforge.dev), which preprocesses datasheets into clean, validated, machine readable register data you can use in your LLM workflows.

Looking for more devices to add to the library. If there's a chip with a rough datasheet you'd want parsed, let me know or hit the request button on the site.

wraith spruce
wraith spruce
# rotund folio Side note but, I think 'productivity gains' are maybe a bad way to think about h...

When I say "productivity gains" I don't necessarily mean more net production. I mean production per unit time. If LLMs increase productivity in this way, that means more potential downtime, which means less burnout. They definitely shouldn't be used to try to squeeze even more value out of people who are already burning out.

Lowering barriers to entry is a good angle, but we've already seen that backfire a lot with thing like code camps. Lowering the barrier to entry means more low quality applications out there. We saw this during the late 1990s Java boom, when code camps (not called "code camps" at the time, but they are exactly the same thing) produced tens of thousands of Java programmers that largely only produced garbage. We've seen this over the 15 years many more times with code camps mainly for web frameworks, which again mainly turned out a ton of garbage.

I think if people are determined to learn how to program themselves, LLMs are awesome for filling in the gaps while they learn the base stuff they need to know before they can learn how the deeper stuff works, but if they are just going to learn to use the LLMs and never learn to program themselves, they are going end up producing enormous amounts of garbage, because they will never gain the skills to vet the code the LLMs are producing for them.

So, as a tool to reduce the learning curve and then to use to improve productivity once you've done the learning and are capable of programming without an LLM, I think LLMs are great. As a crutch to fake being a programmer and impose upon the world more tons of crappy code, I think we could do without.

I don't think we need people who can't program at all trying to write a Discord replacement with LLMs. Without a basic capacity to review and check the code, all it will do is drive everyone back to Discord and make them feel like failures.

rotund folio
#

I suppose what I'm getting at is that the externally-focused framing tends to sacrifice the self on the altar of whatever social and economic powers there are; which makes LLMs basically chew up people and spit them up, as it were...

deep summit
rotund folio
#

Whereas the internally focused framing of 'what does this let me do?' is empowering.

wraith spruce
# cold barn Hey all, I work in semiconductors and got tired of LLMs hallucinating register a...

First, this is a major reason I'm not currently using LLMs. They don't handle very low level stuff well, especially microcontrollers, due to lack of context, and datasheets don't provide that very well.

Second, this sounds like an awesome tool for exactly my use case! Register addresses aren't the only problem, but they are one of the biggest ones. (Other things include unique hardware that even the datasheets don't do a great job of explaining, like the RP2350 HSTX or the RP2040 and RP2350 PIOs and interpolators.) This is a massive step forward though!

deep summit
wraith spruce
deep summit
wraith spruce
rotund folio
#

I guess I'd prefer to view it the way one might talk about someone picking up a painting hobby to decorate their house... they might produce 'garbage art' from some professional standpoint, but its a nicer world to live in if we say 'hey that's awesome, you're having fun!'

#

a lot of garbage code might be produced, but if people feel that the garbage code they produced is actually helping them, that's enough

wraith spruce
#

I'm mainly talking about the kind of people who use AI to generate essays for homework, who just put in the prompt and then copy/paste the results without ever even reading it. Those people don't even want to learn, and they are generating worthless electronic litter by their use of AI. If people produce waste in the process of learning, that's normal and expected. If they produce waste because they won't learn, that's a problem.

rotund folio
#

ehh, this gets into the whole 'below the threshold of reasonable nitpicking' to me

#

like, if I'm tolerating someone streaming netflix for 8 hours, I should tolerate someone using the same resources to generate something random

#

how dare someone drive up into the mountains to go camping, think of the CO2 costs! etc

#

I agree that people who do that kind of thing aren't learning, but that's kind of... so be it?

wraith spruce
#

Maybe you are right. I mean, I'm not saying this is a justification for not using AI. It's just a negative side effect of it existing, and I don't think it outweighs the positive. I do think we shouldn't be trying to sell AI as "You can make apps without learning to code!" though, because that will attract the people who will make millions of garbage apps that clutter app stores and the internet so badly that you can't tell what is good and what isn't.

rotund folio
#

to put it another way, the carbon intensity of paying for compute to run LLMs is slightly lower than the carbon intensity of investing that same money in the market as a whole, so its pretty neutral to me

#

well I don't particularly think we need to 'sell AI' either

wraith spruce
#

I'm not that worried about the carbon footprint of AI in this context. It's the proliferation of garbage that makes it difficult to filter out what is and is not good.

rotund folio
#

also I have to be honest with myself - the code quality from Claude one-shotting something is higher than what I normally produce in a totally human way 🙂

#

there's weird stuff but also, it actually has comments and follows PEP8 and so on

wraith spruce
#

Lol! I mean, I guess that's a good point. But then, the very fact that you know that means that you have some investment in learning.

#

(I don't follow PEP8. I follow most of it, but there are parts I disagree with that I contentiously don't follow...unless I'm working in someone else's code base, and they are following it, of course. I'm not a monster.)

rotund folio
#

Sure, and its both useful and enjoyable for me for myself to be that way. But I wouldn't want to impose that on others who find themselves in other circumstances

#

hm, how would I put it... I think caring about things comes from passion and energy and the feedback loop of seeing those things actualized, so when I see something that helps people find passion and energy even if its awkward at first, I'm optimistic

#

whereas when people are being driven by external things ('I need money to live', etc) I find that's where you get the more harmful patterns where people do lots of things they fundamentally don't care about, just to meet that external pressure

#

so like the student using an LLM for their essay - if we didn't tell that student that doing poorly on the essay would tank their future, they probably just wouldn't generate it

#

this is a bit of a faith in people thing though I guess?

wraith spruce
#

Ok, so I'm going to amend my position a little bit:

If people who aren't willing to learn to program want to use LLMs to generate applications for personal uses, I have no problem with that. I write up small throwaway Python programs to do trivial but tedious things all the time, because I need to calculate something, I want to check if something is true, or I need to do some sort of analysis. If they want to use it for things like that, and are willing to take the associated risks, that's their business. (I'm currently running a 29 line Python program that opens a socket on a specified port, listens for connections, and when it gets a connection it prints the IP address, the first 32 bytes transmitted, and the number of connections accepted since the program started, and then closes the socket. I'm running it on port 80 and port 18789, though I think my router is blocking incoming traffic on 18789. I wrote this two days ago, to see if I could detect attack attempts on 18789. I've had 0, compared to port 80, which is forwarded and has had 905 attack attempts. I'm sure an LLM could have written in a fraction of the time, and I don't really care that much about quality so long as it does what I want.)

People who are trying to create and sell apps this way though, I do have a problem with, because they don't even have the capacity for basic quality control. It feels dishonest to put a price on something like this, because they can't possibly have a clue what it is actually worth.

rotund folio
#

yes, 'sell' immediately makes my hackles raise 🙂

#

and I think there's also valid concerns about people being overeager to contribute to some opensource thing but doing it in a way harmful to the project (again though I have problems with this as a human, and that means I contribute much more rarely to opensource projects than perhaps I should)

wraith spruce
#

I think contributing to open source this way depends on the willingness to learn thing. Contributing to open source is an awesome way to learn, and even if your code isn't great, most projects will appreciate the attempt and maybe even help you learn to do it right. If you aren't willing to learn though, please don't try to contribute to open source just by generating code with an LLM. It just wastes everyone's time.

But yeah, a lot of people who are willing to learn don't contribute to open source as much as they would like to, because they don't think they are good enough, and that's a shame.

rotund folio
#

I can imagine even a role for won't-learn-LLM-generators - its the same role that 'rapid prototypers' have in professional endeavors

#

which is basically, they make something that acts as a mockup that others can interact with to see if a given direction is even a good idea

wraith spruce
#

Actually, I can imagine a company assigning some non-technical desk employee to use an LLM to put together a mockup app, so that they can then bring that to a commercial software company to show them what they want.

#

I've worked a little bit in this kind software dev, and clients often have ideas in their heads that they struggle to express in words. On top of that, there are often disconnects within the client company, where different people have a different idea of what they are getting, and it turns into a huge mess once they see what is actually being made. If they could have someone do a mockup with an LLM and then review it and work out the disagreements and misconceptions before going to the software dev company, they could save a ton of time and a lot of money!

I'm going to keep this in mind, in case I ever end up on either side of this. The potential savings just in making sure everyone is on the same page and knows what they want is huge, and there's a lot of social capital saved as well (mainly frustration caused when everyone starts arguing about how this isn't what they wanted at all).

rotund folio
#

if you're ever in a position to say if it worked without breaking an NDA I'd be curious to know

wraith spruce
#

I hope I'm not, as this is not exactly my favorite kind of work, but if it does happen, I'll try to remember to report back!

rotund folio
#

heh

#

I'm not at that interface but I guess actually there are people I could share the idea with who might be able to use it productively too

tulip owl
#

As an intermediate programmer with very wide but relatively shallow knowledge, I'm well aware of how much I still don't know.

I'm currently deer-in-headlights frozen with the amount of study ahead of me, doubting whether it's worth the effort. Adding LM agents to my already unmanageable syllabus has overwhelmed me, since I can clearly see that software engineering is quickly becoming a supervisory skill.

Reaching deep enough knowledge to both expertly prompt and wisely evaluate agent output is such a hopelessly distant goal that I just can't motivate further progress, because none of this seems fun anymore.

deep summit
deep summit
#

I don't think you need to expertly prompt because you can always test, throwaway and retry.

rotund folio
#

I'd start by asking it to do something you already know how to do and have done. There'll usually be recognizable stuff and surprising stuff.

So e.g. if you taught digital audio engineering, ask it to code something from that field and see if it comes up with usual patterns or strange ones.

For example, I asked for a pitch shifter and it gave me a (bad) vocoder approach. I picked at it, and it went to a method using a ring buffer which was better.

I imagine you might already have opinions about that?

wraith spruce
# deep summit I don't think you need to expertly prompt because you can always test, throwaway...

I got really good at prompt writing for some of the Stable Diffusion image generation models just by spending some time doing it. Overall I maybe spent 100 hours, which is not that much time when you spread it over a year or so, and even better, just the first 10 hours is where I had the vast majority of my improvement. The last 80 hours were learning tricks and workarounds for when the model didn't understand specific words. (For example, "troubador" didn't work, but you can get pretty close with "jester holding" some instrument and a handful of elements fine tuning the details.)

Coding LLMs should be much faster to learn. I took around a year to a year and a half on the image stuff, mainly because the fastest I could get on my computer was around 30 minutes a prompt (you have to do multiples, like 4 at a time, because a single image doesn't give enough information about what the prompt understood and what it didn't). For images with usable resolution, it took 60 to 90 minutes for a 4-pack, depending on settings. Coding LLMs have much faster feedback, so you don't have to stretch 100 hours out over a year or more to get really good at prompt writing.

deep summit
deep summit
south python
deep summit
south python
#

I don't see the pride in craft there anymore. It feels like middle management to me

deep summit
#

In my mind middle management just sits in meetings making vague decisions. I like the chef metaphor because you are still hands on with the output and making sure it is what you want.

rotund folio
#

You also don't have to do it that way. I think you could totally reverse it and ask the LLM what you should code, if you wanted

#

Honestly they're inconsistent enough at debugging that just sending it back is bad practice IMO

deep summit
#

I think they mean that they send it back with more context

quasi nest
wraith spruce
# rotund folio You also don't have to do it that way. I think you could totally reverse it and ...

This is probably how I'll end up using it, once it becomes better suited for the kind of stuff I do, or once I start doing things it is better suited to. This is already my normal coding pattern in some degree. I never copy/paste StackOverflow code. I look it at and figure out how it works, then I write the code I need based on what I learned. I don't use StackOverflow for code examples much anymore though. Now days it's more like looking at the code for 4+ different HSTX/DVI drivers, trying to figure out what the heck it is doing and why, then asking people who give me 80% of the answer and then never respond again, and then I use all of that to write a driver that doesn't work quite right and I have to work out the rest on my own. (But AI definitely wouldn't be able to do it, because there's not enough example code, any complete explanations, and even the datasheet is far too vague.)

Anyhow, the cost of doing it this way is that I'm still writing the code myself. I've mentioned before though that time spent typing is very small compared to everything else when programming, so this is small cost. The gain I get from it is that I actually understand what the code is doing, it's more maintainable because I wrote it myself, and the code I wrote is optimized for the specific task I need it for. Typically code found online is either intended for a different task, or it is so massively overgeneralized that is extremely inefficient and doesn't even do what I need it to very well. I imagine AI generated code is somewhere between this and task-appropriate code, so it seems to me the same "use as reference for writing task-optimal code" strategy is probably the best. (Though perhaps this will change as coding LLMs improve.)

rotund folio
#

From what I can tell, AI generated code won't prioritize optimization by default, but AIs can reason through optimization if asked. However the actual importance of particular optimizations is somewhat more prone to error than other things

#

e.g. it might highlight something you do literally once that could be done more efficiently, over e.g. refactoring the whole algorithm to inherently be easier to make efficient

#

but if you sort of know what you're going for, I find it very useful for things like vectorizing code for example; you have something you write out as serial for loops and say 'write this as a one-shot numpy expression'

#

also thats nice because its very easy to test

#

For example I had a very, very slow 'average curvature' calculation I had to do over a mesh; igl (a Python lib with this built in) was very inefficient when the curvature needed significant blurring. DeepSeek wrote a per-vertex for loop in python as a first attempt at a faster version, but when prodded, wrote something using torch sparse matrix libraries that I hadn't been aware of

wraith spruce
# rotund folio From what I can tell, AI generated code won't prioritize optimization by default...

It's not necessarily performance or memory optimization. Most web frameworks (for example) aren't that well suited to any particular application. They might have say, a menu system, but the way it is built won't work very well for your application unless you change the application itself to fit the framework. This is true of almost all "reusable code". It's rarely well suited to any particular use case. Given that these AI models are trained on a huge number of framework-like "reusable code" instances (because they are so common), they are naturally going to prefer that kind of generic programming rather than programming to suit your specific use case. So they won't produce code optimal to your application. Of course, you are right that they aren't great at performance/memory optimization either, but that's not the primary issue in this case.

rotund folio
#

ah, yeah, that's something to map out in advance and be specific with if you must...

wraith spruce
rotund folio
#

claude had a tkinter preference for awhile that was sort of awkward; everything I made with it had the exact same flickering UI bug

wraith spruce
# rotund folio claude had a tkinter preference for awhile that was sort of awkward; everything ...

Yeah, this kind of thing. Say I'm writing a Pygame application (which I do a fair amount), and I ask the AI to write up a sidebar button menu. Even if I tell it to do it in Pygame, it will likely attempt to use tkinter functions, because Pygame doesn't have built-in GUI widgets. And even if I manage to get it to implement the GUI widgets, it's going to write tkinter or GTK style interfaces which are great for normal GUI applications but terrible for real-time video game applications. By the time I've worked out a prompt that generates a system for GUI widgets that is well designed for video game applications, I could have just written and tested all of the code for it myself.

rotund folio
#

I suppose what I'd do is, separately, have it write a library for GUI stuff in pygame; have it write documentation for that; then put that documentation into the context for the actual thing I want it to develop

#

it'd be slower once, but then I'd have it for every future project

#

and it has that kind of good modular structure that AI seems to work well with

wraith spruce
# rotund folio I suppose what I'd do is, separately, have it write a library for GUI stuff in p...

You could do that, but then you'll have a GUI system that uses the Observer pattern (because that's what every GUI system uses, and that's what the LLM knows), which creates a ton of input lag.

For real-time video games, you would want a GUI system that injects GUI interaction events into the event queue to be handled in the event handler. That avoids the input lag associated with multi-listener/multi-layer Observer style event handling, and it allows you to optimize in the event handler if optimizations are needed.

The problem is that every open source GUI system is designed for productivity applications where a bit of input lag doesn't matter. Video game companies don't use publically available GUI systems. They roll their own, often on a per game basis, precisely because of this problem. So the AI won't be familiar with GUI systems that use an event queue instead of a heavily layered multi-listener Observer pattern. Even if you told the AI to generate a GUI library that uses an event queue instead of Observer, it probably won't understand, because it has never seen the event queue style.

If you are only doing casual turn-based games, I'm sure it could build a GUI system that works very well for that, but for real-time games, I'm not sure there is a prompt that would work. Maybe AI will eventually get there, but it's still very young and inexperienced, especially in things that are mostly done in proprietary software that doesn't have publically available code.

#

Of course, that's just one example. In many cases not being perfectly fit for the use case isn't a disaster. People have been adjusting their web site designs to fit the constraints of frameworks for decades now. But there are cases where that isn't acceptable.

rotund folio
#

I would just specify what you want and try it?

wraith spruce
rotund folio
#

I think if you just asked 'give me a GUI' you'd get what you just said, but since you know to ask for a different way of structuring it, I think it should follow that pretty well

wraith spruce
#

For all I know, I'm wrong about this one specific case, but there are tons of others. And it's not a reason to not use AI all together. It just means sometimes you'll ask the AI to do something, it will fail really badly to do what you need, and then you'll have to evaluate and decide whether you want to risk taking more time trying to get the AI to do what you want, or if you should just write it yourself.

(As far as the AI following a prompt to write a GUI system that uses an event queue instead of Observer, that only works if it has seen a lot of instances of event queue based GUI systems. If it hasn't, it can't do it, because it doesn't have the training to be capable of it.)

rotund folio
#

ehh, it can do things it hasn't seen but you need to be specific

#

like, actually design it

#

one of the test things I did (wrote it 5 times 5 ways) was an idle game except that you could scrub back and forth through time and change your actions

#

which required a very fiddly set of constraints on what game mechanics were actually allowed

#

I did have to tell it how to do it ('everything piecewise linear, inject events into the future for when boundaries will be met, recompute here, not there, keep these events, not those') but it did work

#

that said it was very much not modular and kind of un-continuable

wraith spruce
# rotund folio ehh, it can do things it hasn't seen but you need to be specific

Sorry, but that's not how neural network AI works. It can sometimes combine two things it has seen into something in the middle it hasn't seen, but this often produces garbage. The theory is that if you train it enough, it can get in-between stuff more consistently, but in reality it is turning out that there's a hard limit on this. And when it hasn't seen one side almost at all (event queue injection is fairly uncommon outside of OS and video game programming), it can't even do an in-between.

wraith spruce
rotund folio
#

well if we're boiling it down, all it has to do is generate the next token appropriately

#

so if you construct the problem so that that is in distribution, the overall iterative process can go very far out of distribution

#

that's part of why the chain of thought stuff works

#

and the whole 'lets think step by step' that preceeded it

#

I'm not sure I'm going to be baited to making a pygame GUI library to prove my point though 🙂

#

but, you said you use stable diffusion right?

#

remember controlnets?

wraith spruce
#

Ok, I'm not trying to be rude, but do you know how neural networks work? Do you know training works and how it makes the neural network capable of producing the outputs you want? AI is not intelligent. It can't work out things it has not seen. It can put together things it has seen to make new compositions of them, but it can't include components it has never seen. Sometimes you can work around this (for example, I got stable diffusion pretty close to a troubador by telling it to generate a jester with a musical instrument), but this doesn't tend to produce very good results.

Sure, with a controlnet you can shape the results. This works not by improving the AI or teaching it something new. It works by manipulating the random noise image that the model is iteratively refining to try to force it into a particular shape, so that the model will start recognizing parts of the image as limbs and such. It can't make the model produce a component it has never seen before though.

A better example for Stable Diffusion would be Lo-RAs, which are essentially miniature models that get injected into the SD model in memory. Lo-RAs are minimodels that you train on something that SD has to little training on to generate. For example, SD before 3.0 doesn't have training on glowing eyes, so you can't get it to generate something with glowing eyes. But there are a few Lo-RAs you can add that will give it that training. No amount of controlnet will achieve that though.

Now, if you have a code generator LLM that only understands event queues in terms of something that OS provides that you only ever read from, it won't be able to figure out a GUI that uses event queue injection. But if that LLM take something equivalent to a Lo-RA (are Claude "skills" basically that?), then it would be able to.

rotund folio
#

I train neural networks professionally, yes

wraith spruce
rotund folio
#

I'm first author on this, for example: https://arxiv.org/abs/1612.04530

#

the thing is that 'in distribution' and 'out of distribution' depend on the representation of the space; a lot of why neural networks work well is that you can move problems from spaces where each instance is unique to spaces where instances overlap in structure

#

so in the controlnet case, you're saying 'I'm providing the global structure, the only job of diffusion is texture', which takes a very out of distribution thing like making a McDonalds made out of ivy into 'make ivy that follows these contours'

#

similarly, if I'm asking an LLM 'make code that detects a mouse click against the alpha mask of this texture' it's seen that. If I say 'now submit the mouse click into a queue' it's seen that. If I say 'now process the queue one event at a time' it can do that. But what I've made by putting those together can be something it's never seen

#

claude skills are just text documents btw 🙂

wraith spruce
# rotund folio remember controlnets?

Ok, so, I can think of a potential controlnet style approach that might work. You would have to start by explaining to the LLM that it can inject events into the event queue, and you would have to explain how it can do this. You might be able to achieve the second by having it put the Pygame documentation into its memory buffer. From there, you could probably get it to do what I'm talking about by telling not to create listeners but instead to just inject user events. Because LLMs are often stubborn though, you might have to manually remove unused Observer pattern code, because even telling it to leave that out might not work.

One problem I see a ton in image generation AI is the AI overlapping concepts. For example, if you tell it to make a person playing an electric guitar, it will dress the person as a rockstar, and it can be insanely difficult to get it to change the clothing to anything else. I had exactly this problem when I was trying to generate a traditional bard. Because of the modern D&D rockstar bard trope, the AI has confused "bard" with "rockstar" so badly that I was unable to get SD (2.5, I think) to dress the bard in anything but black leather.

wraith spruce
rotund folio
#

yeah so at least I think it would be worth trying

#

it could of course still not work, who knows!

#

diffusion models have a problem that their prompt and output space are not well aligned... there were some methods that helped with this like gligen, but stuff moves too fast

#

basically (IMO at least) parts of the prompt should be tied to areas of space in the image

#

that currently happens via attention only, which is why you can get replications of the same part of the prompt in multiple places

#

though more recent models are better about that

#

there should also be ways in which parts of the prompt are tied (by the user) to eachother, rather than relying on attention

wraith spruce
#

Yeah, I'd certainly try it, just to see. Back when I was doing the image generation stuff, I frequently gave it prompts I didn't expect to work, just to test the limits of the AI. This made me way better at writing prompts, because it taught me what I could get away with.

I would do exactly the same thing with coding LLMs, for the same reason (and also just out of morbid curiosity).

rotund folio
#

so 'a red jacket and a green vase' doesn't make a red vase

wraith spruce
#

Right. And honestly, it wouldn't be that hard. Well, it would be tedious. Better image labeling would probably make a huge difference though.

rotund folio
#

oh the other funny thing you can do with diffusion models is if you kill a lot of the weights in the deep layers, it tends to suppress memorized images

#

and that can help with generalization; like, its hard to get SD to make 'zuckerberg but old'

#

because there's too much zuck out there

wraith spruce
#

Interesting. I'll have to keep that in mind.

rotund folio
#

this is as of SDXL at least; qwen, flux, etc are different architectures

wraith spruce
#

Yeah, there is clear overtraining on some SD prompts. This is part of the bard problem.

#

Mostly I use SDXL now. I use 3.5 a bit, but my computer can only barely handle it. (I used to have two video cards. The one I used for the primary died, and now the NVidia card I bought for dedicated neural network use is pulling double duty. Sometimes the rendering duties will consume just enough memory that there isn't enough for SD3.5 until I reboot.)

rotund folio
#

yeah I think sdxl is a sweet spot

#

newer ones trade a bit too much flexibility for fidelity

#

I was having a hard time even getting qwen to do 'watercolor style'

wraith spruce
#

I agree. 3.5 does some good stuff for generating images of synthetic humans with glowing cracks in their perfectly smooth skin, but SDXL is great for most things I care about. (I was working on a tabletop RPG product with some friends, using SDXL to generate graphics and I tried some things that XL didn't do well on 3.5 for fun.)

rotund folio
#

yeah, I might even go back to 1.5 for some stuff in that sphere

#

or like pingpong back and forth between them in an upscale loop

#

in particular I think for backgrounds and cityscapes and things like that, more recent SDXL stuff has been more character focused

wraith spruce
# rotund folio or like pingpong back and forth between them in an upscale loop

I would actually really like to setup a workflow that uses multiple models like that. I think it is technically possible with ComfyUI, but I think it has to keep them all loaded in memory, so I would need to have enough memory to have all of the models I'm rotating through in memory at once. For SD 1.5, I don't think that would be an issue. SDXL might be a problem though, especially for print resolution images (and/or batches of 4 or more).

rotund folio
#

if you have ollama installed it can just grab stuff from huggingface

wraith spruce
#

I usually download from Huggingface in the browser, but it can be a pain finding the right file.

cold barn
# wraith spruce First, this is a major reason I'm not currently using LLMs. They don't handle v...

Totally agree, raw datasheets are noisy and LLMs fall apart fast with low level stuff. Register parsing was just the most approachable starting point since the data is "well defined" and the pain is obvious. Longer term I want to do full datasheet RAG with an MCP server on top so you can do precise per device lookups.

The PIO/interpolator thing is a good example of the harder problem though. I dont really know how you could tackle a problem like that outside of including examples from the internet as a part of the knowledge base.

deep summit
cold barn
#

Not yet, I'll see if I can put the data sheets on a Git repo when I have some time.

rotund folio
#

so, this was a ridiculous experiment, but... I got Claude to play Frogger realtime, by running a server where various fetch URLs would enter in different commands. It took maybe four or five iterations but it eventually ended up winning.

#

its winning strategy was to write a script to play for it

spare tartan
rotund folio
#

A server returning an ASCII rendering of the stage, with different urls triggering commands when fetched

south fjord
#

Anyone successfully use PicoClaw on Raspberry Pi3B+?

rugged marsh
#

(Apologies if I've missed anything obvious) - is there a good repo or approach people are using to pull in skills for "standard dev practices"? Context: I'd like to put an agents.md across all my coding projects, that captures something like a "best practices dev flow" that's language agnostic. Then, based on the stack I'm using for any given project, additionally pull in any stack/language specific practice (e.g. python, C++, etc). Ideally I'd just like to follow something reasonable (e.g. google's style guide and best practices) but I'm not sure if someone has already taken the time to package this into a nice set of prompts I can pull in?

rotund folio
#

Alright, that was about 3 weeks and change, but I have a release of the midi arranger thing I've been coding with Claude. Windows build configuration is definitely something I will be happy to remain 'not wanting to actually learn about'! But it seems like it works (though of course it will pop a Windows Defender warning since its a brand new executable and I'm not paying whatever $500/year for a cert from MS): https://github.com/ngutten/arranger/releases/tag/v0.1.1

GitHub

What's Changed

This is the first actual release. Trying an AppImage approach for Linux to avoid requiring the user to compile the audio server backend part themselves, but may change this in t...

#

it was, uh, interesting to actually try to get this in some distributable form that doesn't require compilation

#

and tbf I will probably have to take this down and do it all again since I'm sure I've done something incorrectly somewhere...

deep summit
#

ASIC for inference is much faster and power efficient

rugged marsh
deep summit
#

I think another reason to try LLM agents is to discover how much they still cannot do. It's easy to imagine them being able to do everything if you don't try them. Yesterday, I spent all day doing test tweaks because the agent didn't do it well. One example is that it insisted on retrying to fix a flaky test instead of fixing the root cause.

rotund folio
#

heh

#

with the understanding that this might actually end up getting published somewhere or make it into some official thing somewhere (and that I might get paid for contributions to it, in case you don't want to do free labor) - what sorts of things do you wish people actually benchmarked about how these things work, versus e.g. 'SWEBench accuracy' and things like that?

#

I might have a bite on that 'quantify different development strategies in terms of code maintainability and extensibility' proposal I've been floating around

deep summit
#

I'm not sure we know how to use it now and therefore we can't know how to benchmark it.

tulip owl
#

Adding agentic programming to an already polished skillset is no doubt amazing. Delegating tedious boilerplate tasks to a coding robot must indeed be extremely empowering, especially when invoicing a C‑note per hour with more prepaid tokens available than one can possibly spend.

But I'm not yet capable of writing advanced code myself, so the mountain of study ahead of me now feels insurmountable. Not only do I still have reams of knowledge left to acquire before I can even evaluate an agent's voluminous output, but corralling the imprecise agent itself grows more dizzyingly complicated by the day.

Admittedly, I'm probably stuck in the worst catch-22 position of all, where agents have obliterated my years of existing progress, but before I had attained the expertise required to direct and sagely adjudicate the self-programming machine. And since coding is "solved" now, any further attention paid to that antiquated abstraction layer feels wasted.

I'm sure future generations who rely solely on AI for every single aspect of their entire lives will inherently master it, but the best I can hope to achieve at this point is becoming just another slop vibecoder, blindly issuing prompts without ever truly understanding either the underlying system being programmed, nor the vague methodology used to do it.

deep summit
#

Coding is definitely not solved

naive scroll
# tulip owl _Adding_ agentic programming to an already polished skillset is no doubt amazing...

Do you want to make a living coding, or is this a hobby? I don't understand why you think "agents have obliterated my years of existing progress". I don't see that your knowledge is obliterated. The expertise you are garnering helps you to code with or without LLM assistance. Learning to play an instrument was not obsoleted by the invention of radio or recorded music. Don't fall prey to impostor syndrome.

tulip owl
# deep summit Coding is definitely not solved

I was referring to a recent YouTube video where Boris Cherny, developer of Claude Code, discusses our current moment in agentic programming. https://www.youtube.com/watch?v=We7BZVKbCVw

And for those investigating closing the embedded development loop, a new nRF Agent debugger captures BLE stack RTT/UART logs for diagnostic analysis (since this is likely on @deep summit's front burner). https://github.com/adsumnetworks/nRF-AI-Debugger

Boris Cherny is the creator and head of Claude Code at Anthropic. What began as a simple terminal-based prototype just a year ago has transformed the role of software engineering and is increasingly transforming all professional work.

We discuss:

  1. How Claude Code grew from a quick hack to 4% of public GitHub commits, with daily active users...
▶ Play video
GitHub

Open-Source AI Agent for debugging nRF Connect SDK applications in VS Code. Generates logging code, captures live RTT/UART logs, and provides expert AI analysis. - adsumnetworks/nRF-AI-Debugger

tulip owl
# naive scroll Do you want to make a living coding, or is this a hobby? I don't understand why ...

I appreciate the encouragemnt, @naive scroll. But I'm absorbing the sudden breakneck influx of information, trying to wisely choose how to invest my limited time and resources, and I'm nearly paralyzed.

How much more should I keep studying doubly-linked lists, versus the latest agents.md techniques? Do I pick one LLM model and focus on it exclusively, or spread my exploration around? It's overwhelming.

And while I do enjoy writing code, I don't like wasting time, so hobby or not, I want to spend my neurons on things that are both fun and productive. You've obviously mastered advanced programming, so an agent is likely a huge assistance to you, but if AI only amplifies ingenuity, then I fear I can only contribute louder noise, not signal.

deep summit
#

That debugger is a neat idea but it's also likely slop. I was looking at a bunch of kanban projects and they all look similar to that. I wouldn't trust it until it appears humans are actually looking at it

tulip owl
deep summit
#

I gained that from looking at LLM output. You can too

tulip owl
naive scroll
#

I learned to program partly by reading books, but really mostly from reading existing code (that I trusted), and writing. That was the attitude of my CS professors; they expected their students would have summer or school-time jobs or projects that would be the apprenticeship towards real skills.

If i needed to use doubly-linked lists early on in my learning, I would have looked at an existing implementation, whether it be in code I was looking at, or in a textbook. If an LLM can produce that code reliably, and you can read and understand it, then that's another way.

The LLM's are not substituting for your knowledge, any more than listening to well-played music will teach you guitar chord positions.

#

Adafruit has found that (before LLM's), people who had a project they wanted to do would take an existing project and try to make it more like what they want. They might have essentially no experience at coding, but we provide examples in the Learn Guides, and that code is a stepping stone to knowledge. Sometimes the person just wants to get the project done and never gains a full (or even much) understanding of the code, sometimes they use it as a learning stepping stone.

deep summit
tulip owl
# deep summit I disagree. My take on that repo is based on me looking at other recent repos. T...

I'm not sure what you're disagreeing with. It is your existing expertise that enables you to filter worthwhile tools from vibe slop, whereas to my naive eye that repo seemed like the exact closed loop jumperless breadboard idea that you expressed a few weeks ago, except for nRF BLE. I also appreciated its open-source license (the proprietary aspect of AI is a rant I've determinedly avoided).

tulip owl
# naive scroll I learned to program partly by reading books, but really mostly from reading exi...

I learned to program just as @naive scroll did, by reading other's code and poking it to determine how it worked. Starting on the venerable Apple ][, I moved through Assembly, Basic, C on CP/M and Unix, Visual Basic, Processing, HTML/CSS with Javascript, Arduino, Micro/CircuitPython, and lastly Zephyr.

But I'm still no expert; wide but shallow knowledge. In fact, the similarity between all those platforms was the easy part, but the deepdive details often still elude me. And I'm hesitant to spend any more effort learning said details because the software world is clearly shifting, and fast.

deep summit
#

@tulip owl It isn't clear to me what your end goal is. Is it to learn computers or to make something to do something?

tulip owl
rotund folio
spare tartan
# tulip owl I learned to program just as <@329766224093249548> did, by reading other's code ...

You might find it more useful or interesting to ask the LLM, as a learning-around-the-subject tool, to identify patterns and techniques [or syntax] that would be non-obvious to new/mid-level users. Either as a review step for you after the LLM does some work, or just to explore some code.
If you ever see anything fly by [or in the git diff] that's gibberish/magic [like question mark operator or non-obvious functions/syntax etc] then ask for explanation+examples and then you visit the Lang docs page for that thing [read around the subject].

I find it interesting to watch pull requests in repos that are technically interesting, see what others are doing [only those I've followed after seeing their subject area / code + discussion style]

midnight radish
#

I've been programming for two decades but only started using a coding assistant last week. I'm still a on the fence about it. It doesn't necessarily help me write well, but it does embolden me to try more ambitious projects than I would have. However, I am finding that it takes a lot of coaxing to actually get what I want, and it relies heavily on the "does it compile" metric rather than a sane analysis of how to approach a problem.

#

It's a little jarring to switch from role of programmer to that of a reviewer and be bombarded with lines of code I didn't write

timber jungle
#

The thing with AI is that it can't really seem to build full apps by itself yet. I've tried. My Copilot got caught up in old versions of libraries in a web app and in a 1k LoC Rust codebase. I tried applying to upwork (never went through with it) and they had you evaluate AI while asking traditional computer science questions. I don't see it replacing experienced programmers anytime soon. Gen AI is another tool for programmers that's currently overhyped. People use it to write but that doesn't make it the best writer. And I don't believe it should ever replace juniors, but people are trying I guess. Then there's issues of e.g. hallucination, sycophancy, security if not implemented properly, safety. That being said I still had success with Copilot and Stitch AI with guidance. And I want to integrate AI summarization for blog posts onto my site when I can figure out how to secure my Gemini API key properly. Local AI home assistants seem promising imo. That's something I still want to try.

rugged marsh
#

Gemini CLI (and other coding systems) have a setup you can pull in (plan mode in Gemini CLI) that sets up various prompts to use this approach.

Otherwise, it is indeed prone to throwing out big unstructured changes.

Make it create Tasks you know are small changes, review the tasks , and mostly let it auto run each task waiting till its tests pass, and that change is much easier to review and correct if needed

rotund folio
#

I made this mostly with Sonnet 4.5, some Opus 4.6, over about 3 weeks

timber jungle
# rotund folio I mean, do you count something like this as a 'full app'? https://github.com/ngu...

I stand corrected. Full apps are possible. AI is moving faster than I thought lol. Cool app. I know someone working on a couple of music apps as side projects with Tone.js and the Web Audio API, also created with AI. He doesn't like using the Web Audio API directly though. The nodes require some setup and I guess timing/creating and deleting resources can be tricky. I guess what's important is highlighting what people can do with AI. Even if AI is running a bunch of stuff independently I believe there should still be humans in the loop. Stich AI to Google AI studio is my new favorite AI design workflow but it doesn't seem to like tailwind 4 in my testing. Stuff like that should get better though.

rotund folio
#

well I think ultimately, 'make things' is good 🙂

#

For this it started with a web UI version (not my choice, Opus going a bit nuts from not having Python UI libs installable), but that wouldn't have worked well with live editing and realtime rendering I think

#

so Web Audio had to go

#

this has been through something like three major refactors, which was an interesting experience... some things are still quite hard, other things just go very fast

#

like, I probably spent more turns handholding it through 'set up a windows build on GitHub' than 'write the entire arranger'

timber jungle
#

@rotund folio Ugh I still don't know how to set up a windows build on GitHub lol. That's cool though thanks for sharing!

stark saddle
prime kettle
swift junco
#

If you're doing any OSS project maintenance, I encourage people to apply for free Claude maintainers access - https://claude.com/contact-sales/claude-for-oss - don't get hung up on the stars or download counts bit, the second point of "If you maintain something the ecosystem quietly depends on, apply anyway and tell us about it" is intended to cover others.

midnight radish
#

I don’t know enough to choose between Claude, GPT, Gemini, or Grok, so I set it to “auto” mode in vscode. But I’m finding the assistant to be a bit unruly in doing things I didn’t ask for. Today I asked it to convert a function into Cython. It did so, but also removed the tqdm progress bar which was my metric for seeing how fast it ran.

rotund folio
#

thats probably not claude; claude's a hoarder

midnight radish
#

Ok. I’ll switch to Claude and see if it behaves. Do you notice a big difference between the versions?

rotund folio
#

yeah, sonnet 4.5 is my favorite for efficiency because I'm cheap, but it has some limits on very complex stuff; opus 4.6 and sonnet 4.6 are both better for more complex architectural things but of course not perfect

#

opus 4.6 will consume like 5x or more tokens than sonnet 4.5 for the same task

#

so for example, I recently added a feature 'singing synthesis' to my midi arranger; massive feature, had to re-architect how some parts of the engine worked (including a prerender stage), so I did the whole thing with opus to avoid mistakes that would break stuff or dead ends

#

sonnet 4.5 would have maybe gotten it done but I would have expected lots of bugs and broken things as it went

midnight radish
#

For embedded I found that test-driven development helps a lot, and that the assistant can actually write pretty good tests. In general it’s easier to verify the correctness of the test than the code under development.

rotund folio
#

I think it really depends on what you're doing, but there are times for different strategies... I want to quantify this actually eventually

#

I generally have found that (at least the Claude family) models are much less efficient at debugging than they are at implementation

#

I'll ask for observability before just letting them run with tests of their own making

#

but I'm doing UI-heavy and perceptual stuff

#

where there's no formal verifiable criterion other than 'does it sound good?' sometimes

#

oh, I will say, sometimes Opus can get stuck on wanting to run the code itself and can make weird decisions to force that to be possible

midnight radish
#

I’m usually doing back-end. But actually I need to try some front-end stuff, because I’m 5x slower at writing Javascript

timber jungle
#

Copilot GPT-4.1 just happily deleted my main loop trying to fix python indentation errors from a bad patch. I wish Pro had more premium credits EDIT: I was able to restore it redownloading it from GitHub... make sure you back changes up lol

rotund folio
#

lol

#

Yeah I make sure to commit and push everything before starting any new thing

#

even then, you have to be careful of things getting editted in place in long multi-turn sessions

midnight radish
rotund folio
#

one upside of Claude Code is that it does this automatically (makes a dedicated branch) but I actually tend to just use the web UI most of the time

midnight radish
#

Web UI? For GitHub? I only do that for READMEs. I always make some mistake in the markdown 😅

rotund folio
#

I just give it the code files I want it to use in particular, ask for the feature, download those to my drive, test, and commit/push myself

midnight radish
#

Oh, you meant the web UI for Claude. I didn’t know that existed

rotund folio
umbral harness
#

Don’t know if it’s just me.. but does anyone else feel guilty when they get an llm to write a function or something that just works first time.. like “should I have just taken 5 minutes to write it rather than get it done for me in 5 seconds?”

rotund folio
#

Not anymore, I adjusted to that pretty quickly. But I do feel nervous when the LLM starts explaining why it thinks a bug is occurring and there's stuff in the explanation that is nowhere near how I would have implemented the thing, or uses weird lingo

#

Its like, wait, we're using mutexes? Since when was this threaded?

#

... why do we have HTTP serving access points for this fluid dynamics simulator?

#

(ok that one is made up, but I did have Opus just decide to make things a web app in the middle of a feature implementation once)

#

or like when it just spontaneously decided to start making little interactive websites to illustrate its points during a planning/discussion session

south python
#

is prompt sanitation still a largely unsolved problem?

#

feels like even the big names still suffer prompt attacks by way of rewording or fancy language

rotund folio
#

I kind of want to come up with a really zany way to represent the logic flow of something now, put it in an image, and just see what comes out the other end

#

like, sketches of things with arrows in some tree, no words or explanations given

#

maybe the arrows sometimes end in balls or boxes or bars, or sometimes they squiggle, or sometimes they end on another arrow

#

and just like 'interpret this as a program please'

rotund folio
#

ok, lol, I told it to write this as a program; it gave me a little website with butterflies flying around and polinating flowers in a loop

#

I wouldn't maybe suggest that this would replace natural language, or code 🙂

south python
#

senior enterprise engineer sitting at his desk with crayons and craft paper, vibe coding the next killer app

rotund folio
#

I kind of want to make a game where you have to draw runic networks and magic circles and hope Claude can interpret your nonsense as the actual script that solves a level

#

Code pictionary

rotund folio
#

Ok, I got Opus to interpret this (and other made-up diagrams) as signal processing circuits. The funny thing is it invents new code-level things when unsatisfied with representing its ideas for how to interpret shapes. New visual grammar for audio effects?

#

maybe interesting for an art piece or something... draw something and it's turned into a musical instrument you can play

cold barn
viscid prism
midnight radish
#

Oh wow, this is yours! I may be trying it out.

deep summit
south python
deep summit
#

I totally agree with this:

I never generate more than I can review in a sitting.

#

I think we'll figure out how to work well with LLMs as a tool