#Detecting invalid JS strings

6 messages · Page 1 of 1 (latest)

proud tendon
#

Is there any built-in way to "detect" invalid strings?

Here's an example:

// this is an invalid unicode code point
const bad = "\udc11"

// but i can console.log it:
console.log(bad)
// prints: �

// and i can use it in other strings:
const foo = bad + "-" + bad

// but when i try to evaluate it in the repl:
> badAndPrefixed
Unterminated string literal Unknown exception

Some questions:

  1. Can I somehow detect "bad" Unicode strings?
  2. Why can I console.log it, and what does it do?
  3. What happens in the Deno REPL that makes it throw an error?
wooden walrus
#

for now you can test it with !/\p{Surrogate}/u.test(str)

#
  1. console.log() and the REPL have different printing implementations, with console.log() being written in JS and the REPL mainly in Rust. Rust's string type is UTF-8-based, and it doesn't support invalid UTF-16.
#

Looks like what console.log() is essentially doing is encoding the JS string into UTF-8 using the "lossy" encoding which turns invalid UTF-16 code points into a replacement character (U+FFFD, �)

#

I think there's a bug open for the REPL output to do that