#hadfdasdf

81 messages · Page 1 of 1 (latest)

harsh kiteBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question run !howto ask.

limber hound
#

the encoding of characters is not defined by the standard, but almost all systems have char use ASCII for the "basic character set", which is a set of characters that is guaranteed to be able to be stored in 1 byte

there can be more characters, called the "extended character set" which can require more than 1 byte to store

the point of wchar_t is that it can represent all characters of the basic and extender character sets in one wchar_t

#

there is a set of characters that can be represented (almost always ASCII) in a char (1 byte), and other characters may be stored in a string with more than 1 char (multiple bytes)

wchar_t is so that you can represent any character in one value (of type wchar_t)

#

no, it depends upon the implementation what characters can be used

#

whatever compiler and platform you're compiling for, there is some flexibility in what is allowed so everyone doesn't have to do the exact same thing

#

;compile

char c='Ѐ';
rare pantherBOT
#
Compiler Output
<source>: In function 'main':
<source>:3:8: warning: multi-character character constant [-Wmultichar]
    3 | char c='Ѐ';
      |        ^~~
<source>:3:8: warning: overflow in conversion from 'int' to 'char' changes value from '53376' to '-128' [-Woverflow]
limber hound
#

;compile

#include<stddef.h>
wchar_t w=L'Ѐ';
rare pantherBOT
#
Compilation successful

No output.

limber hound
#

no warnings here because it can be represented in wchar_t

#

yes each character will have a unique value, and the compiler will know what all of the values mean
so doing something like char c='a'; will result in the correct value for a, so that putchar(c); will correctly print out the letter a

#

it's up to the implementation, on windows it's UTF-16 and on most other operating systems it's UTF-32

not sure what you mean by "what is the biggest amount of bytes the wchar_t data type can hold?"

#

the size of a wchar_t object will always be the same

#

depends upon the implementation, on windows it's 2 bytes and on most other operating systems it's 4 bytes

#

yeah

#

wchar_t can represent any character the implementation defines

sharp oar
limber hound
limber hound
warped junco
#

a character set is something like UTF-8 or ASCII

#

it tells you what numeric values correspond to what characters

#

char16_t is basically designed for storing UTF-16 characters, not sure if it's strictly mandated to be UTF-16 though

#

and when speaking about "width" for integers generally, that refers to the number of bits

#

nothing is really automatic there

#

it's the responsibility of the developer to ensure that only UTF-16 characters go into char16_t for instance

#

there is nothing in C that would prevent you from doing this

#

or that would prevent you from putting together a char16_t[] that isn't actually a valid UTF-16 string

#

it's all on you

limber hound
warped junco
#

it's not UTF-32 on Windows

#

wchar_t is basically a pile of garbage

#

don't ever use it unless someone forces you at gun point

#

like the Windows API does

limber hound
warped junco
#

basically wchar_t was originally intended to support "all characters", but that was back when unicode was still fairly small

#

it quickly became clear that 16 bits aren't enough, but Windows got stuck with a 16-bit wchar_t while it's 32 bits wide on Linux

#

you use wprintf for wide characters

#

and I'm not sure if it's restricted to be only UTF-16, or only UTF-32

#

but any sane implementation will use use one of those two

#

even if it's not stricty guaranteed, it would be inconceivably rare to see anything else

#

the character set of I/O functions is dictated by the current locale

#

it's not something that's mandated by the language

harsh kiteBOT
warped junco
#

the locale is actually responsible for a huge amount of things, including dictating the character set of char in I/O functions

#

it dictates things like date/time formats, how numbers are printed, etc.

limber hound
#

anything outside of the "basic character set" can be changed by the current locale

limber hound
#

if the basic character set is ASCII, then it will be unaffected by the locale

#
char c='a';
setlocale(...);//change the locale to something else
//still guaranteed to refer to a
sharp oar
#

It's just implementation defined, for example windows it's two bytes, on other platforms it's normally 4, although i think you'll mostly see wchars on code targeting windows

warped junco
#

the implementation is the term that C uses for the compiler, OS, and CPU architecture

#

basically the thing that makes C run, as a whole
although the term mostly refers to the compiler, and the compiler then makes decisions based on the OS and CPU

#

yes, the size of wchar_t will be 2 bytes on Windows, and 4 bytes on Linux

#

this is defined somewhere in the operating system's documentation or something

#

and the compiler just chooses one of two options based on what OS you're compiling for

harsh kiteBOT
#

@mystic elk Has your question been resolved? If so, run !solved :)

limber hound
#

'a' results in the value correct for doing char c='a';, and L'a' results in the value correct for doing wchar_t w=L'a';

#

L'a' is equivalent to doing btowc('a')

warped junco
#

what about L"1234"

limber hound
#

what are you using to run the code?

warped junco
#

that implies that printing wide strings works in principle, but perhaps the terminal that you're using to display this doesn't have a font that can display these characters

#

at least that's one explanation

limber hound
#

setlocale(LC_ALL,"C.UTF-8"); seems to fix it for me

#

a lot of platforms won't support unicode in the default "C" locale

#

yeah

#

because that's the name of the locale that I'm using to enable unicode, UTF-8, UTF-16, and UTF-32 are all just different ways of encoding unicode

#

no

#

for windows I think you will need to set the codepage

spice wyvern
#

each of them can encode all of unicode

#

it will be an integer like short or long

spice wyvern
#

yes, a character set is the set of characters that can be encoded

spice wyvern
#
  • L is probably derived from the word "long", and it makes a string literal into a wide character string literal.
  • there are no other "string prefixes".
  • L is not defined in the wchar.h library. it is part of the string literal. the meaning of L"hello" is built into the language, as is the meaning of 3 or "hello".
  • btowc converts one character only
limber hound
spice wyvern
#

'h' is a char
"hello" is an array of char
L'h' is a wchar_t
L"hello" is an array of wchar_t

limber hound
#

it's part of the string literal or character literal

#

the standard calls them an "encoding prefix"

harsh kiteBOT
#

Thank you and let us know if you have any more questions!

This thread is now set to auto-hide after an hour of inactivity

harsh kiteBOT
#

<@undefined>

Please Do Not Delete Posts!

Please don't delete forum posts. They can be helpful to refer to later and other members can learn from them. In the future you can use !solved to close a post and mark a post as solved.