#@Erik Sipsma ack - i'm rebased on `main`

1 messages ยท Page 1 of 1 (latest)

flint moss
#

I hit it seemingly very randomly, can't repro when I actually want to. Last night I ran a loop trying to repro it with the tweaks in place and ctrl-C'd after a few hours with no success in hitting it again. So would want more data points before assuming it's gotten worse and not just your turn at the freak occurence ๐Ÿ˜„

hollow wing
#

fair - i assumed it was more likely to happen under load so was a bit surprised to see it happen immediately

#

exiting the command and restarting it works fine

#

kind of feels like a deadlock? are we not doing BEGIN IMMEDIATE everywhere?

#

(can check once i pop the stack)

#

looks like the code tries to, but it's all just URL params so hard to say if it's being respected:

        RawQuery: url.Values{
            "_pragma": []string{
                "foreign_keys=ON",    // we don't use em yet, but makes sense anyway
                "journal_mode=WAL",   // readers don't block writers and vice versa
                "synchronous=OFF",    // we don't care about durability and don't want to be surprised by syncs
                "busy_timeout=20000", // wait up to 20s when there are concurrent writers
            },
            "_txlock": []string{"immediate"}, // use BEGIN IMMEDIATE for transactions
        }.Encode(),
flint moss
#

wonder if we're hitting that somehow

#

I also was wondering if the fact that we open the underlying DB multiple times (shared one for writing, but each sse handler re-opens for itself, which makes sense in that the client may be gone) could somehow be doing something funky? fwict that should be totally valid, but it raised my eyebrow a millimeter or two

#

the fact that it's transpiled sqlite makes me slightly more suspicious about something weird going on there, wouldn't be surprised if there's not many users of the modernc lib that open db multiple times like we do, which raises the chances of us hitting a bug not noticed previously

hollow wing
#

possibly, yeah i'm not sure how the difference shakes out vs. having a single read/write db connection

flint moss
# flint moss Yeah I kind of doubt at this point it has to do with actual load, it seems like ...

Actually I just went to the docs that post links to: https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode

And some of the scenarios do sound like something we could hit?

  • If another database connection has the database mode open in exclusive locking mode then all queries against the database will return SQLITE_BUSY. Both Chrome and Firefox open their database files in exclusive locking mode, so attempts to read Chrome or Firefox databases while the applications are running will run into this problem, for example.

  • When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a separate attempt is made to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get an SQLITE_BUSY error.

#

That makes me wonder more if the opening of the db several times is a culprit...

#

I feel like it wouldn't be that hard to avoid doing that. We'd need some light ref counting around a shared db so we can know when to leave it open and when to close, but probably not too bad?

hollow wing
#

they're already per-client so we can probably just keep a single one open for the lifetime of the client

flint moss
# hollow wing they're already per-client so we can probably just keep a single one open for th...

my understanding, which may be wrong, is that we do a separate open in the sse handler because it's possible that the client in question closed down, in which case the single shared one won't be around anymore https://github.com/dagger/dagger/blob/979ed22e8763953a0745b4beeeddb16a73aea0a9/engine/server/telemetry.go#L620-L620

GitHub

An open-source runtime for composable workflows. Great for AI agents and CI/CD. - dagger/dagger

#

if that's not true then even better

hollow wing
#

ah, it's to avoid it being closed, so the client can safely read the last few bits of telemetry (draining, my old friend)

hollow wing
flint moss
hollow wing
#

haven't started it, go for it! ๐Ÿ™