#"database system is starting up" error when testing

1 messages · Page 1 of 1 (latest)

bitter hazel
#

Currently, when running the tests for my application each run a couple tests fail with the error
warning codes.vapor.application : [AppTests] PSQLError(code: server, serverInfo: [sqlState: 57P03, file: postmaster.c, line: 2334, message: the database system is starting up, routine: ProcessStartupPacket, localizedSeverity: FATAL, severity: FATAL])
Upon retry the specific tests always works. (Each run different tests fail)
I was able to reproduce this error by creating a new vapor application from the template with postgres and then just adding the following test:

    func testGetExistingTodo() async throws {
        let app = Application(.testing)
        do {
            defer { app.shutdown() }
            try await configure(app)
            try await app.autoMigrate()
            
            let todo = Todo(title: UUID().uuidString)
            try await todo.create(on: app.db)
            
            try app.test(.GET, "todos") { res in
                XCTAssertEqual(res.status, .ok)
                XCTAssertContent(Array<Todo>.self, res) { content in
                    XCTAssert(content.contains { $0.id == todo.id! })
                }
            }
        } catch {
            app.logger.report(error: error)
            throw error
        }
    }

If I now start the database in the docker compose file with docker compose up db and run the test repeatedly until failure (right click on the test diamond on the left in XCode) one in about five hundred tests fails with the above error message.
What can I do about this error? Maybe a postgres configuration, or another way to solve this? Should I create an issue on Github?

peak dirge
#

It sounds like a race condition if different tests fail each run

#

I've also wondered why the tests don't use the setup and teardown methods it seems, and if that could make a difference when running many test cases. There might be some more certainties afforded in the protocol.

#

as it sounds like the db was started when it was already starting and doesnt support that, and each test is seting up and tearing down all in the test case.

#

since they are async they will do it all at once. I think i will experiment on this real quick.

#

so far i made a new project with the toolbox, with postgres added ur test to AppTests and ran. results : Test Case '-[AppTests.AppTests testGetExistingTodo]' started. 2023-07-10T11:20:50-0500 warning codes.vapor.application : [AppTests] PSQLError(code: server, serverInfo: [sqlState: 28P01, file: auth.c, line: 329, message: password authentication failed for user "vapor_username", routine: auth_failed, localizedSeverity: FATAL, severity: FATAL]) /Users/jimmyhoughjr/workspace/work/testRace/Tests/AppTests/AppTests.swift:34: error: -[AppTests.AppTests testGetExistingTodo] : failed: caught error: "PSQLError – Generic description to prevent accidental leakage of sensitive data. For debugging details, use `String(reflecting: error)`." Test Case '-[AppTests.AppTests testGetExistingTodo]' failed (0.633 seconds). Test Case '-[AppTests.AppTests testHelloWorld]' started. 2023-07-10T11:20:51-0500 info codes.vapor.request : request-id=FC3568D3-084C-4308-9651-DA8F20709C06 [Vapor] GET /hello Test Case '-[AppTests.AppTests testHelloWorld]' passed (0.004 seconds). Test Suite 'AppTests' failed at 2023-07-10 11:20:51.294. Executed 2 tests, with 1 failure (1 unexpected) in 0.636 (0.636) seconds Test Suite 'testRacePackageTests.xctest' failed at 2023-07-10 11:20:51.294. Executed 2 tests, with 1 failure (1 unexpected) in 0.636 (0.636) seconds Test Suite 'All tests' failed at 2023-07-10 11:20:51.294. Executed 2 tests, with 1 failure (1 unexpected) in 0.636 (0.637) seconds jimmyhoughjr@Jimmys-Mac-mini testRace %

#

So it seems I dont have the same reproduction as you do.

#

I believe the db setup can be done differently and that will fix this. I need to redownload and install Xcode though. You could look at the vapor repo for some of its tests.

#

one thing i see that is different from your test and theirs is app.test() vs app.testable.test() when they make their requests like in AsyncAuthTests

bitter hazel
peak dirge
#

I’ve recently began paying more attention to the tests with websocket kit pr but haven’t much xp with how they setup testing

#

When you had one test you manually clicked 500 times for it to fail or did you automate ?

bitter hazel
bitter hazel
peak dirge
#

Derp yeah I skipped that part. I’ve not eaten today and I burn through sucrose like air

bitter hazel
peak dirge
#

😄

#

That’s neat

#

I’ll rerun and see what I see.

peak dirge
#

strangely, the env in the compose file and the default values seem to be correct yet auth is still failing for vapor_username

#

not sure i really feel like chasing that down given the last five weeks of my life.

#

as. it makes zero sense for auth not to work with the preconfigured credentials that I can think of

bitter hazel
#

Is there maybe running another db on that port? You could change the exposed port of the docker db and also adjust it in the env file. And if you are using Xcode you should set the custom working directory for it to actually find the env file

peak dirge
#

It shouldn’t be running anything else on that but I’ll see. I restarted and walked outside to contemplate existing or not

bitter hazel
#

Ok, take your time, thanks for your help so far

#

I just looked in to testable, and it does not solve the problem, in fact in my actual application tests I am using app.testable().test I just forgot to add it to this example

#

Also, at times what seems to be a race condition appears during the application shutdown, what seems weird to me:

2023-07-10T19:42:26+0200 debug codes.vapor.application : [Vapor] Application shutting down
2023-07-10T19:42:26+0200 trace codes.vapor.application : [Vapor] Shutting down providers
2023-07-10T19:42:26+0200 trace codes.vapor.application : [Vapor] Clearing Application storage
2023-07-10T19:42:26+0200 trace codes.vapor.application : [Vapor] Shutting down EventLoopGroup
2023-07-10T19:42:26+0200 trace codes.vapor.application : [Vapor] Application shutdown complete
2023-07-10T19:42:26+0200 warning codes.vapor.application : [AppTests] PSQLError(code: server, serverInfo: [sqlState: 57P03, file: postmaster.c, line: 2334, message: the database system is starting up, routine: ProcessStartupPacket, localizedSeverity: FATAL, severity: FATAL])
2023-07-10T19:42:26+0200 trace codes.vapor.application : [Vapor] Application deinitialized, goodbye!
peak dirge
#

ive oten had weirs shutdown behavior for as long as ive used vapor

#

usually anytime an error response was generated it would take too long to sshutdown and warn about it

#

but idk if ctrl c would be the same signals.

#

but i like your style of logging to see wtf things are actually doing

bitter hazel
peak dirge
#

and its weird to see it starting up in a shutdown

#

id notice when i would ctrl c

#

that id have to hit it and wait soemtimes 30 + seconds for it to register

bitter hazel
peak dirge
#

ether way kudos to both

bitter hazel
peak dirge
#

I can recall even in the old days that db access in paralellized tests can cause these kinds of issues

#

but it was usually more senior ppl dealing with those decisions

#

and noone wants to pay for tests really

bitter hazel
#

Ok, so taking a closer look in the logs, the error didnt occur during the shutdown but before, in the trace logs it has already been logged before once:

2023-07-10T19:58:36+0200 trace codes.vapor.application : database-id=psql psql_connection_action=sendStartupMessage(AuthContext(username: "vapor_username", password: ********, database: "vapor_database")) psql_connection_id=727 [PostgresNIO] Run action
2023-07-10T19:58:36+0200 trace codes.vapor.application : database-id=psql psql_connection_action=wait psql_connection_id=727 [PostgresNIO] Run action
2023-07-10T19:58:36+0200 trace codes.vapor.application : database-id=psql psql_connection_id=727 [PostgresNIO] Channel read event received
2023-07-10T19:58:36+0200 trace codes.vapor.application : database-id=psql psql_connection_action=read psql_connection_id=727 [PostgresNIO] Run action
2023-07-10T19:58:36+0200 trace codes.vapor.application : database-id=psql psql_connection_id=727 psql_message=.error(PostgresNIO.PostgresBackendMessage.ErrorResponse(fields: [Routine: "ProcessStartupPacket", Localized Severity: "FATAL", Code: "57P03", Severity: "FATAL", File: "postmaster.c", Line: "2334", Message: "the database system is starting up"])) [PostgresNIO] Backend message received
2023-07-10T19:58:36+0200 trace codes.vapor.application : database-id=psql psql_connection_action=closeConnectionAndCleanup(PostgresNIO.ConnectionStateMachine.ConnectionAction.CleanUpContext(action: PostgresNIO.ConnectionStateMachine.ConnectionAction.CleanUpContext.Action.close, tasks: [], error: PSQLError(code: server, serverInfo: [sqlState: 57P03, file: postmaster.c, line: 2334, message: the database system is starting up, routine: ProcessStartupPacket, localizedSeverity: FATAL, severity: FATAL]), closePromise: nil)) psql_connection_id=727 [PostgresNIO] Run action
#

The shutdown happens afterwards and the error message is there again probably due to the do {} catch in the test function

peak dirge
#

I think that is probable.

#

i see the close action there

#

so it still seems to point to closing beore its done starting somehow, if that is truly what is happening

#

I think i see run actions, read actions adn the close actions overlapping on the same connection maybe

bitter hazel
#

I could upload the entire log file if it helps but I dont have enough knowledge of postgres and postgresnio to get more value from the logs myself

peak dirge
#

we may have enough of a pattern in what is there

#

im still trying to see why my postgres port is in use when it sholdnt be after multiple restarts

bitter hazel
#

Ok, thanks so far. For debugging the port lsof -i :5432 might be able to help

peak dirge
#

cathcing up not sure where the postgres install came form but killed it

#

blinkenlights

#

ran 100, now running 500

#

well no crash in 500 times

#

this is the postgres im using

#

starting PostgreSQL 15.3 on aarch64-unknown-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit

bitter hazel
#

Ok, well thats weird, for me sometimes it got lucky and didint fail with 500 tried but most of the time one of them failed

peak dirge
bitter hazel
peak dirge
#

maybe i guess you could see about upgrading to my versino and see it it has the same issue

bitter hazel
#

Yes I will try that. Are you using docker for the db? I just used the one that came with the template

peak dirge
#

yeah same here just the docker db in a terminal and running tests in xcode

#

that repo shold be the latest of what i have

bitter hazel
#

Ok, my docker just had an old image a new docker pull solved it. Will try to reproduce the error now

peak dirge
#

even if its not a bug, you found the cause possibly

#

I copied the first test renamed it and am running the whole suite 500 times

#

so see if parallel matters

bitter hazel
#

Yes, that seems to have been the cause of the problem. So far it works for me too. Thanks for all your help

peak dirge
#

no problem.

#

tell ur hiring managers lol

charred ledgeBOT
bitter hazel
peak dirge
#

I can sniff out bugs even when i dont know wtf im doing

bitter hazel
#

Oh no here we go again, the error came once more

peak dirge
#

race conditions arent so easily found :/

bitter hazel
#

I thought and hoped it was gone

peak dirge
#

is that in my testRace repo or your original code?

bitter hazel
#

It was in my test Repo. i just downloaded yours and will see there

peak dirge
#

yeah it might tell something

bitter hazel
#

Well, now I have a new error message: PSQLError(code: connectionError, underlying: read(descriptor:pointer:size:): Connection reset by peer (errno: 54))

#

Now on a retry I got the initial error again in your repo

#

Only thing I changed was the log level to trace

peak dirge
#

weird.

#

i still feel like maybe the test setup and teardown is the heart of the issue

bitter hazel
#

So what would you suggest to change?

peak dirge
#

im not sure at the moment, nothing stands out as of yet. It just feels like thats where the issue has to come from.

#

I wonder if it happens with another db driver

#

that could be a test maybe to see if it does it against a sqlite db

#

that could mean its in the swift side vs postgres

bitter hazel
#

Well, sqlite seems to work

#

At least I did not have problem with a total of 1500 tries

#

Were you able to reproduce the error with postgres?

peak dirge
#

I haven’t tried yet, what steps you recommend?

#

Also are your dependencies up date ?

#

I was going to suggest it’s not in the db but app tear down maybe but this info goes against that

#

Does ram do anything when it fails I wonder also

bitter hazel
#

Well, I downloaded your repo and ran the test repeatedly and after some tries it fails for me

peak dirge
#

How many iters?

bitter hazel
#

I used 500 and in most cases this was enough, but on some occasions all of them worked

peak dirge
#

And that is running one test or all of app tests ?

bitter hazel
#

Just the testGetExistingTodo test

peak dirge
#

So I’ve not gotten an error with 500 or 1k of those

#

How much ram u have ?

bitter hazel
#

You mean in general or for docker?

#

16 in general and i think 4 for docker

peak dirge
#

Either might be relevant

#

Ok I have 8 so it doesn’t seem to be low ram

bitter hazel
#

Yeah but what I am asking me is how we get different results when we have the same db using docker and the same project

#

Or rather what isnt working on my machine to cause the failures

peak dirge
#

yeah we are narrowing in on something maybe

#

im on mac os beta

#

so far we can reproduce a failure iwth postgres and not sqlite on your machine. thats isolated to your machine and maybe postgres

#

which maybe makes it not so unique.

#

let me try with sqlite on my end

wild bone
#

We found an issue in the new state machine in the PostgresNIO package a week ago or so. I’ve hacked together a fix for us at least but it’s not great. It seems that something in the CoW avoidance mechanism is causing an unexpected pointer deallocation and then the connection gets nuked

#

Or perhaps the other way around 🫠

wild bone
#

I’ve not really gone much deeper in but I wonder whether this might actually be related to the new move only value types and ownership traits being experimenting with in Swift 5.9? It seems plausible that this could cause some problems at least 😬

bitter hazel
#

Interesting to hear, I will try it with your package as a dependency then

bitter hazel
#

However, this seems to be another error, I can still see the original error happen

bitter hazel
peak dirge
#

Id think there are new keywords for the new semantics

bitter hazel
#

But a couple months ago when I last worked on that project I cannot remember such test failures, back then everything was fine. So it could be due to changes in the state machine in postgres nio

#

But it could as well be something else

bitter hazel
peak dirge
#

Well I assume you have to declare a type as move only or consume and stuff

bitter hazel
#

Well, I did not need to change anythin, I just installed the Xcode beta which uses the new swift version. Apart from that everything stayed the same on my end

peak dirge
#

Yeah I’m just saying I don’t think 5.9 changes any existing behavior

bitter hazel
#

Ah ok sorry, then i misunderstood

peak dirge
#

It could maybe but I don’t think it would force the new memory access onto existing declarations.
Id think that would break a lot

peak dirge
#

So I recall you introduced some delay to no effect. Maybe try more delay?

#

I also wonder if it can be isolated further by only creating the db once then iterating. That way the start/stop of the db is eliminated

#

that in combo with the sqlite tests working might mean its in postgres, ie it cant start and stop that fast

wild bone