#I have issues with UTF-8.

16 messages · Page 1 of 1 (latest)

pearl shuttle
#

Hello Guys,
I wrote a script that edits a CSV to my liking by adding a row in the beginning and two in the back.
The script works fine, apart from the fact that the input file will come with vowels (ä,ö,ü etc.)
Usually I would read the file with -Encoding UTF8 and then export the new file the same way. How ever, it doesnt work. I will input the file with vowels and it will export it like in the picture.

Please see my script and show my mistake. notlikethiscat

solemn walrus
pearl shuttle
#

CSV Before the Script:

KUNDENNUMMER;KUNDENBEZEICHNUNG;INHABER_NAMENSZUSATZ;INHABER_DOMIZIL;INHABER_GESCHLECHT;HAUPTSEGMENT;SEGMENT;KUNDENKATEGORIE;NLNAME;NLNUMMER;REGION;BETREUER_ANREDE;BETREUER_VORNAME;BETREUER_NAME;BETREUER_TELEFON;BETREUER_KURZZEICHEN;BETREUER_FUNKTION;GEBURTSDATUM;KUNDENALTER;EROEFFNUNGSDATUM;STATUS_FINNOVA;AKTIVER_KUNDE;ADRESSAT_KUNDENNUMMER;ADRESSAT_TYP;ADRESSAT_ANREDE;ADRESSAT_TITEL;ADRESSAT_VORNAME;ADRESSAT_NAME;ADRESSAT_NAMENSZUSATZ;ADRESSAT_ZUHAENDEN;ADRESSAT_STRASSE;ADRESSAT_ADRESSZUSATZ;ADRESSAT_ORT;ADRESSAT_LAND_ISOCODE;ADRESSAT_BRIEFANREDE;ADRESSAT_GESCHLECHT;SPRACHE_KORRESPONDENZ;VERSANDART_KORRESPONDENZ;P_ADRESSAT_KUNDENNUMMER;P_ADRESSAT_ANREDE;P_ADRESSAT_TITEL;P_ADRESSAT_VORNAME;P_ADRESSAT_NAME;KEINE_WERBUNG;WERBESPERRE;VERSAND;EBANKING;ARCHIV;DOCTYPENR
123456;Skywalker Luke;;Schweiz;Männlich;PC;PCI;Namenkunde;Bern;200;Mitelland;Herr;Anakin;Skywalker;011 222 333 444 ;CS8;Betreuer Private Kunden;01.01.2000;22;01.01.2000;Aktiv;J;1234567;I;Herr;;Luke;Skywalker;;;Milchstrasse 1 ;;5000 Tatooine;Schweiz;Sehr geehrterr Herr Skywalker;Männlich;d;BPO;123456;;;;;N;N;Y;N;N;0

#

CSV After the Script:

"DOCID";"KUNDENNUMMER";"KUNDENBEZEICHNUNG";"INHABER_NAMENSZUSATZ";"INHABER_DOMIZIL";"INHABER_GESCHLECHT";"HAUPTSEGMENT";"SEGMENT";"KUNDENKATEGORIE";"NLNAME";"NLNUMMER";"REGION";"BETREUER_ANREDE";"BETREUER_VORNAME";"BETREUER_NAME";"BETREUER_TELEFON";"BETREUER_KURZZEICHEN";"BETREUER_FUNKTION";"GEBURTSDATUM";"KUNDENALTER";"EROEFFNUNGSDATUM";"STATUS_FINNOVA";"AKTIVER_KUNDE";"ADRESSAT_KUNDENNUMMER";"ADRESSAT_TYP";"ADRESSAT_ANREDE";"ADRESSAT_TITEL";"ADRESSAT_VORNAME";"ADRESSAT_NAME";"ADRESSAT_NAMENSZUSATZ";"ADRESSAT_ZUHAENDEN";"ADRESSAT_STRASSE";"ADRESSAT_ADRESSZUSATZ";"ADRESSAT_ORT";"ADRESSAT_LAND_ISOCODE";"ADRESSAT_BRIEFANREDE";"ADRESSAT_GESCHLECHT";"SPRACHE_KORRESPONDENZ";"VERSANDART_KORRESPONDENZ";"P_ADRESSAT_KUNDENNUMMER";"P_ADRESSAT_ANREDE";"P_ADRESSAT_TITEL";"P_ADRESSAT_VORNAME";"P_ADRESSAT_NAME";"KEINE_WERBUNG";"WERBESPERRE";"VERSAND";"EBANKING";"ARCHIV";"DOCTYPENR";"DMC_CODE";"JIRATICKET"
"";"123456";"Skywalker Luke";"";"Schweiz";"M�nnlich";"PC";"PCI";"Namenkunde";"Bern";"200";"Mitelland";"Herr";"Anakin";"Skywalker";"011 222 333 444 ";"CS8";"Betreuer Private Kunden";"01.01.2000";"22";"01.01.2000";"Aktiv";"J";"1234567";"I";"Herr";"";"Luke";"Skywalker";"";"";"Milchstrasse 1 ";"";"5000 Tatooine";"Schweiz";"Sehr geehrterr Herr Skywalker";"M�nnlich";"d";"BPO";"123456";"";"";"";"";"N";"N";"Y";"N";"N";"0";"123";"123"

#

@solemn walrus I opened it with notepad++ this time so no excel here

solemn walrus
#

or your source file isn't UTF-8 so telling it to read it that way borks the data from the getgo

slender obsidian
pearl shuttle
#

It was simpler then I thought facepalm the input file is ANSI.

slender obsidian
#

well that could do it to, hehe

digital yoke
#

If the content before contains the characters you expect then you can parse multi-byte sequences using a UTF8 encoding.

UTF8 and ANSI are indistinguishable unless the content includes chars above code 127 and the fact that you can see the chars means everything is correct.

This is the point you need to check:

    $content = Get-Content -Path $csvFilePath -Encoding UTF8

Once you've read that content in (you could have used Import-Csv there for what it's worth): Is the content correct in the console?

bold hollow
#

If it was actually ASCII (aka US-ASCII) then parsing it as UTF8 would almost certainly have worked.

#

But if it came from Excel, for instance, it was probably Windows-1252 (the default for most of us). That can be changed, but it's hidden under "Tools->Web Options" in the file save (export) dialog in Excel, so changing it is rare.

slender obsidian
#
$ansi, $utf8 = 
    [Text.Encoding]::GetEncoding(1252), 
        [Text.Encoding]::GetEncoding('utf-8')

    $utf8.
         GetString( 
              $ansi.
                    GetBytes('Ç') )

recreates the decoding error

pearl shuttle
#

I am away from home so I can‘t recreate it. But a simple:
$content = Get-Content -Path $csvFilePath -Encoding Default

Solved the issue. However, i do have a new issue I am working on that appears on my work PC. I did not have the time to recreate it yet. Will update this thread once I could.