#BSONError: Invalid UTF-8 string in BSON document (mongodb)

1 messages · Page 1 of 1 (latest)

graceful nova
#

I have a system where user gets to submit "combos" via a modal, but sometimes things with weird characters cause mongodb to crash the collection "submits", meaning it's inaccessible

0|index    |   fruit: 'dough',
0|index    |   sword: 'ttk',
0|index    |   fightingStyle: 'superhuman',
0|index    |   gun: null,
0|index    |   combo: 'TTK X,�Superhuman C, Soru,�Dough�C,�Dough�V, TTK Z,�Superhuman�Z',
0|index    |   authorId: '1266381745062809642',
0|index    |   guildId: '1027544757037187163',
0|index    |   suggestionId: '#19D5549',
0|index    |   status: 'pending',
0|index    |   upVotes: [],
0|index    |   downVotes: [],
0|index    |   _id: new ObjectId('66e1520e80836efee93e1f99')
0|index    | }
0|index    | {
0|index    |   fruit: 'dough',
0|index    |   sword: 'ttk',
0|index    |   fightingStyle: 'superhuman',
0|index    |   gun: null,
0|index    |   combo: 'TTK X,�Superhuman C, Soru,�Dough�C,�Dough�V, TTK Z,�Superhuman�Z',
0|index    |   authorId: '1266381745062809642',
0|index    |   guildId: '1027544757037187163',
0|index    |   suggestionId: '#19D5549',
0|index    |   status: 'pending',
0|index    |   upVotes: [],
0|index    |   downVotes: [],
0|index    |   _id: new ObjectId('66e1520e80836efee93e1f99'),
0|index    |   createdAt: 2024-09-11T08:17:18.874Z,
0|index    |   updatedAt: 2024-09-11T08:17:18.874Z,
0|index    |   __v: 0
0|index    | }```

```ts
0|index    | Error saving combo: BSONError: Invalid UTF-8 string in BSON document
0|index    |     at parseUtf8 (/root/bfc/node_modules/bson/lib/bson.cjs:148:19)
0|index    |     at Object.toUTF8 (/root/bfc/node_modules/bson/lib/bson.cjs:273:21)
0|index    |     ... 6 lines matching cause stack trace ...
0|index    |     at [Symbol.asyncIterator] (/root/bfc/node_modules/mongodb/lib/cursor/abstract_cursor.js:159:45)
0|index    |     at AsyncGenerator.next (<anonymous>) {
0|index    |   [cause]: TypeError: The encoded data was not valid for encoding utf-8
0|index    |       at TextDecoder.decode (node:internal/encoding:443:16)
0|index    |       at parseUtf8 (/root/bfc/node_modules/bson/lib/bson.cjs:145:37)
0|index    |       at Object.toUTF8 (/root/bfc/node_modules/bson/lib/bson.cjs:273:21)
0|index    |       at deserializeObject (/root/bfc/node_modules/bson/lib/bson.cjs:2952:31)
0|index    |       at internalDeserialize (/root/bfc/node_modules/bson/lib/bson.cjs:2863:12)
0|index    |       at Object.deserialize (/root/bfc/node_modules/bson/lib/bson.cjs:4335:12)
0|index    |       at OnDemandDocument.toObject (/root/bfc/node_modules/mongodb/lib/cmap/wire_protocol/on_demand/document.js:208:28)
0|index    |       at CursorResponse.shift (/root/bfc/node_modules/mongodb/lib/cmap/wire_protocol/responses.js:207:35)
0|index    |       at FindCursor.next (/root/bfc/node_modules/mongodb/lib/cursor/abstract_cursor.js:222:41)
0|index    |       at [Symbol.asyncIterator] (/root/bfc/node_modules/mongodb/lib/cursor/abstract_cursor.js:159:45) {
0|index    |     code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
0|index    |   }
0|index    | }

https://i.imgur.com/jSNMIky.png
https://i.imgur.com/ihwsC6f.png

https://sourceb.in/CF6WOhkVRy

cedar lotus
#

You might want to ask this to a Mongo focused discord server

#

it appears that somehow you're generating incorrectly encoded BSON

#

There are two modes a UTF8 parser can work in, in one mode when it encountrs malformed multi-byte sequences it replaces the malformed sequence with

#

if you resave a document with replacement characters in it, you can lose data

#

the second mode is: the parser bails out upon encountring a malformed multi-byte sequence

#

mongodb is attempting to parse your bson document with malformed multibyte sequences in it in the second mode

#

to avoid silent data corruption, it throws.

#

So somehow you're generating a bson document with malformd multibyte sequences

#

do you have any idea how you might be doing something like that?

#

one common way to do this is via concatenating buffers with different encodings

graceful nova
#

for now I made a temporary solution with regex

#
export function cleanString(string: string) {
    const pattern = /[^a-zA-Z0-9\s\(\)\-\_\[\]\{\}!@#$%^&*=\+;,'\.\/\?|\`~"<>:\\]/g;
    
    const cleanedString = string.replace(pattern, '');
    
    return cleanedString;
}
cedar lotus
#

oh that's what you usd to fix it, nevrmind

graceful nova
#

I just selected them on my own

#

but it still broke

#

even with this code enabled

#

I tougth it's impossible

graceful nova
#

I now tried a few different solutions at one time