#Receiving some string characters as ? characters from Postgres responses on a websockets connection

28 messages · Page 1 of 1 (latest)

swift lynx
#

I am using Rust to create a Postgres driver. When receiving raw Bytes from a websocket connection how can I properly decode it? I am making a TcpStream call to Postgres and get back a type BytesMut. I do String::from_utf8_lossy(&bytes[..]) to decode it. But that gives me stuff like this

id����createdAt@updatedAt@userName@��6fullName@��hdescription@���region@��6mainUrl���avatar@   ������Dv        12023-10-22 00:18:44.6872023-10-22 00:18:44.687jon
SELECT 1ZI    1     | John JamesI am a developer������������C

I believe those ? symbols represent pipes, because when my text does not have pipes it seems to parse fine without those symboles. But regardless what is the proper way to decode these bytes so that I get back carriage returns, spaces, etc? I tried utf16, but that did not work either (parsed into chinese). For example when using psql I get nicely formatted text like this screenshot.

sonic fiber
#

can you print out what it looks like as [u8]? You need to figure out what encoding it's using.

swift lynx
#

Hi I see this b"T\0\0\0\xf3\0\tid\0\0\0@\x01\0\x01\0\0\0\x14\0\x08\xff\xff\xff\xff\0\0createdAt\0\0\0@\x01\0\x02\0\0\x04Z\0\x08\0\0\0\x03\0\0updatedAt\0\0\0@\x01\0\x03\0\0\x04Z\0\x08\0\0\0\x03\0\0userName\0\0\0@\x01\0\x04\0\0\x04\x13\xff\xff\0\0\06\0\0fullName\0\0\0@\x01\0\x05\0\0\x04\x13\xff\xff\0\0\0h\0\0description\0\0\0@\x01\0\x06\0\0\x04\x13\xff\xff\0\0\0\xfe\0\0region\0\0\0@\x01\0\x07\0\0\x04\x13\xff\xff\0\0\06\0\0mainUrl\0\0\0@\x01\0\x08\0\0\x04\x13\xff\xff\0\0\0\xfe\0\0avatar\0\0\0@\x01\0\t\0\0\0\x11\xff\xff\xff\xff\xff\xff\0\0D\0\0\0v\0\t\0\0\0\x011\0\0\0\x172023-10-22 02:29:52.629\0\0\0\x172023-10-22 02:29:52.629\0\0\0\x03jon\0\0\0\nJohn James\0\0\0\x10I am a developer\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xffC\0\0\0\rSELECT 1\0Z\0\0\0\x05I"

sonic fiber
#

Well this is definitely not a text format. Where did this come from?

swift lynx
#

I'm playing around with pgcat. It's the format returned by Postgres which I guess is binary

wind zodiac
#

I think that might be a row description because of the T

#

but it's just a guess, I never worked with postgres' protocol

#

but that's definitely a binary format you're supposed to deserialize in some way, not text

sonic fiber
#

Well that seems pretty straightforward. Either write it yourself or find it already written somewhere.

wind zodiac
#

the string type is literally a null terminated string, like in the barbarian times

wind zodiac
#

I expected more from postgres lol

#

I'm pretty sure they just want to implement it by themselves

sonic fiber
#

It's better for serialization than for runtime strings

wind zodiac
#

yeah no

#

it's just bad

elfin solar
# swift lynx Hi I see this ```b"T\0\0\0\xf3\0\tid\0\0\0@\x01\0\x01\0\0\0\x14\0\x08\xff\xff\xf...

If you have bytes that aren't perfect UTF-8, do not lossily convert the bytes to string, as it will damage the bytes when you do that. represents a byte that's not proper UTF-8, like b'\xff' or b'\xf3, and there are a bunch of different invalid bytes, so if you convert each of them to the same character, you lose track of exactly which byte each used to be, which is why it's called lossy, because it loses information.

Keep it as a Bytes or convert it to a Vec<u8> or a u8 slice or array. Then, you can decode the actual bytes since you still have them.

edgy imp
wind zodiac
#

yeah the messages are framed which is okay

wind zodiac
#

strlen is not enough because a corrupted message might not have any 0 and go over the boundary

#

so it needs manual searching without any of the strlen tricks

swift lynx
#

As @sonic fiber stated I think I'll use postgres-protocol to do the grunt work

swift lynx
#

One more question if I decide to convert the bytes coming from a DataRow type into my own type the values could be numbers, strings, bytea, etc. Should I be using an rust Any type to handle that? What would be an appropriate rust type that could handle all the possible postgres types?

wind zodiac
#

an enum with all the variants presented in the docs

edgy imp
#

We use the text protocol at work instead of the binary protocol, and we turn it into json. But I don't recommend it as its a sizeable amount of work

#

An enum that can decode all values is indeed the easiest option