Hello I'm reaching out because I think I've run into a bug with PL/SQL Developer.
But I could be doing something wrong with some setting I'm missing.
Here's the scenario, I'm looking at Unicode support within our application and finding that when you insert characters into a table from PLSQL Developer. (I'm currently at Version 12.0.7.1837 but also tested this on version 15.0.1.2051 (64 bit) and he got the same result.)
Here's the problem, when I insert the first Emoji from the Unicode consortium's website (https://unicode.org/emoji/charts/full-emoji-list.html#1f600) using the windows + period (.), the character inserted is a different code point.
See code:
create table possedba.TestEmoji_t (
text varchar2(100 char)
);
insert into possedba.TestEmoji_t (Text)
values ('
');
commit;
select Text, DUMP(Text, 1016) TextDump from possedba.TestEmoji_t;
--Returns: Typ=1 Len=6 CharacterSet=AL32UTF8: ed,a0,bd,ed,b8,80
This byte code returns a "invalid continuation byte" error from our application because it appears to be inserting with a different code point.
When I insert through SQLPlus I get the code point I am expecting:
create table possedba.TestSqlPlusEmoji_t (
text varchar2(100 char)
);
insert into possedba.TestSqlPlusEmoji_t (Text)
values ('
');
commit;
select Text, DUMP(Text, 1016) TextDump from possedba.TestSqlPlusEmoji_t;
--Returns: CharacterSet=AL32UTF8: f0,9f,98,80
This code point returns what I would expect (f0,9f,98,80). This was displaying incorrectly until I turned on the Windows Locale to use UTF-8 and turned on PL/SQL Developer's Unicode support checkbox. Now the character displays correctly within PLSQL developer. But if I insert again it insertes the first code point still (ed,a0,bd,ed,b8,80).
Finally when I paste this emoji in python I get the same bytes back from the SQLPlus Test:
From Python 3.11:
>>> TestEmoji = '��'
>>> TestEmoji
'
'
>>> TestEmoji.encode('utf-8')
b'\xf0\x9f\x98\x80'
When I test use the Unicode number from the consortium, I get the bytes that are present in SQLPlus, which indicates the SQLPlus encoding is working correctly:
>>> unicodeTestEmoji = u'\U0001F600'
>>> unicodeTestEmoji
'
'
>>> unicodeTestEmoji.encode('utf-8')
b'\xf0\x9f\x98\x80'
It seems like PLSQL Developer is inserting the bytes for an old or incorrect character set encoding.
A coworker found this stack overflow (https://stackoverflow.com/questions/34409085/why-does-emoji-have-two-different-utf-8-codes-how-to-convert-emoji-from-utf-8) question that seemed to indicate those bytes (different emoji but the one with 6 bytes) might be coming from an old CESU-8 style encoding.
Can someone help with this? Am I just doing something wrong with my PLSQL Developer settings, or is this a Bug request that should be logged?
Thanks, here's some of my setup info:
NLS_LANG: AMERICAN_AMERICA.AL32UTF8
- This is set in the Registry for both 32-bit and 64-bit and environment variable.
Database Character Set: AL32UTF8
NLS_NCHAR_CHARACTERSET: AL16UTF16
Windows Locale: UTF-8 beta setting.
PLSQL Developer: 'Unicode enabled' flagged
Database Host OS: Linux x86 64-bit
Database Version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0
But I could be doing something wrong with some setting I'm missing.
Here's the scenario, I'm looking at Unicode support within our application and finding that when you insert characters into a table from PLSQL Developer. (I'm currently at Version 12.0.7.1837 but also tested this on version 15.0.1.2051 (64 bit) and he got the same result.)
Here's the problem, when I insert the first Emoji from the Unicode consortium's website (https://unicode.org/emoji/charts/full-emoji-list.html#1f600) using the windows + period (.), the character inserted is a different code point.
See code:
create table possedba.TestEmoji_t (
text varchar2(100 char)
);
insert into possedba.TestEmoji_t (Text)
values ('

commit;
select Text, DUMP(Text, 1016) TextDump from possedba.TestEmoji_t;
--Returns: Typ=1 Len=6 CharacterSet=AL32UTF8: ed,a0,bd,ed,b8,80
This byte code returns a "invalid continuation byte" error from our application because it appears to be inserting with a different code point.
When I insert through SQLPlus I get the code point I am expecting:
create table possedba.TestSqlPlusEmoji_t (
text varchar2(100 char)
);
insert into possedba.TestSqlPlusEmoji_t (Text)
values ('

commit;
select Text, DUMP(Text, 1016) TextDump from possedba.TestSqlPlusEmoji_t;
--Returns: CharacterSet=AL32UTF8: f0,9f,98,80
This code point returns what I would expect (f0,9f,98,80). This was displaying incorrectly until I turned on the Windows Locale to use UTF-8 and turned on PL/SQL Developer's Unicode support checkbox. Now the character displays correctly within PLSQL developer. But if I insert again it insertes the first code point still (ed,a0,bd,ed,b8,80).
Finally when I paste this emoji in python I get the same bytes back from the SQLPlus Test:
From Python 3.11:
>>> TestEmoji = '��'
>>> TestEmoji
'

>>> TestEmoji.encode('utf-8')
b'\xf0\x9f\x98\x80'
When I test use the Unicode number from the consortium, I get the bytes that are present in SQLPlus, which indicates the SQLPlus encoding is working correctly:
>>> unicodeTestEmoji = u'\U0001F600'
>>> unicodeTestEmoji
'

>>> unicodeTestEmoji.encode('utf-8')
b'\xf0\x9f\x98\x80'
It seems like PLSQL Developer is inserting the bytes for an old or incorrect character set encoding.
A coworker found this stack overflow (https://stackoverflow.com/questions/34409085/why-does-emoji-have-two-different-utf-8-codes-how-to-convert-emoji-from-utf-8) question that seemed to indicate those bytes (different emoji but the one with 6 bytes) might be coming from an old CESU-8 style encoding.
Can someone help with this? Am I just doing something wrong with my PLSQL Developer settings, or is this a Bug request that should be logged?
Thanks, here's some of my setup info:
NLS_LANG: AMERICAN_AMERICA.AL32UTF8
- This is set in the Registry for both 32-bit and 64-bit and environment variable.
Database Character Set: AL32UTF8
NLS_NCHAR_CHARACTERSET: AL16UTF16
Windows Locale: UTF-8 beta setting.
PLSQL Developer: 'Unicode enabled' flagged
Database Host OS: Linux x86 64-bit
Database Version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0