Bug with UTF8 encoding - Again

Guttorm

Member²
Both Oracle and PL/SQL Developer seems to be a real pain to work with when it comes to unicode encoded as UTF8.

I thought I had nailed down a recipee that worked for PL/SQL Developer 11.0.3. Then we upgraded to 11.0.5 and now it corrupts our unicode files again..

1) I have configured PL/SQL Developer to enable UTF8
2) I have configured PL/SQL Developer to save as UTF8 without BOM.
3) I have set NLS_LANG to AMERICAN_AMERICA.AL32UTF8

Greek letter

I save a file with just the greek "theta" character, in UTF8 format in Notepad++
Using a hex viewer I see that it contains the two bytes CE B8, which is correct:
http://www.fileformat.info/info/unicode/char/03b8/index.htm

When I open this as a Program File, it displays correctly, but when I save it I get C3 8E C2 B8 0D 0A..
If we ignore the cr+lf at the end, it seems each byte has been re-encoded as a separate "utf8" character.

When I open it as a SQL Script it is displayed as two characters.
And when I save this, I get C3 8E C2 B8 0D 0A, same as above, and equally wrong.

Norwegian letter

When I do the same test with a norwegian character, which is my local character set, it is loaded and saved correctly by the Program Editor, except that it still adds cr+lf, but that is ok, and probably desireable.

The SQL editor makes the same kind of hash of it as for the theta character.
This came as a surprise, since the SQL editor seems to handle proper files correctly. I assume it has some other issue triggered by tiny files?

All in all really unhappy with PL/SQL Developer at the moment.

( Btw, the forum also truncated my 1st posting attempt where I had pasted in a norwegian character, even though it looked ok in the preview. Not happy about that either.. )
 
Last edited:
Hello,

I have configured PL/SQL Developer to enable UTF8 and to save as UTF8 without BOM.

Similar problem occured when I open UTF8 (w/o BOM) + CRLF file with cyrillic characters in Program window. The same file UTF8 (w/o BOM) + LF opened correctly.
 
Hi Marco,
first of all many thanks for the great product which I have been using for more than 10 years!
Have just come across an issue which looks similar to the one being discussed in here
Here is a detailed description
Version 11.0.6.1776. Windows 7 (64-bit)
If you try to open a reasonably large file containing two-byte characters then the file will be displayed incorrectly.
One can reproduce the issue by the following steps
-open the source of the package dbms_preup
-put a two-byte character at the end of the package
-save as
-check that the file is saved in UTF8
-close the package
-open the file
observe that the character is converted to two characters

To avoid the issue one can add a two-byte character at the beginning of the file. In that case the file will be displayed correctly.
 
In my original posting, I said that the SQL editor seemed to behave sensibly for our regular files. Well today it corrupted a file.

This happened after I had inserted a large block of ascii text at the top. When we subsequently open the file and edit it, all UTF8 chars are corrupted.
I assume the problem is that the editor scans the first KBs of text to guess which character set to use.

For now I have 'fixed' it by making sure I have a comment with a unicode character at the start of the file.
 
Back
Top