Hi,
I am facing some problem while loading data to vertica, my source system sends me some chinese letters and this is being rejected to insert in vertica.
This there any special datatype i have to define for the specific column?
Thanks,
Jagadeesh.
How to insert Chinese characters in Vertica
Moderator: NorbertKrupa
Re: How to insert Chinese characters in Vertica
I am using Informatica to load and in the session log i could see some chineese characters and i see a message bad record so i guess it should be because of these special characters
Re: How to insert Chinese characters in Vertica
is there any special datatype for non-ASCII data?
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: How to insert Chinese characters in Vertica
Hi,
There is not a special data type in Vertica to store Chinese characters. All data in Vertica is stored using UTF8 encoding. Just make sure the source data is also encoded in UTF8.
Here is an example.
I created a text file named chinese.txt using Windows Notepad. The file contains the three Chinese characters that can be translated to "I Love You" in English.
They are: 我爱你
I made sure to save the file in Notepad using UTF8 encoding.
Next, I transferred the file to the first node on my Vertica cluster.
The Linux file command can be used to verify that the file's encoding method is UTF8:
Notice that I cat the file, I'll get a bunch of garbage because my terminal isn't set up to display the characters correctly. But that's okay.
Next I created a table in Vertica named chinese having one varhcar column and then loaded the data from the chinese.txt file into it:
I still get the garbage output from my SQL statement.
But the good news, the data in the table is fine.
If I use my dnVisualizer client to query the table, I see the Chinese characters just fine!
Note that I had to change the grid font in dbVisualizer to "Arial Unicode MS" to display the Chinese characters correctly...
The point is Vertica can store Chinese characters in UTF8 in a varchar data field
There is not a special data type in Vertica to store Chinese characters. All data in Vertica is stored using UTF8 encoding. Just make sure the source data is also encoded in UTF8.
Here is an example.
I created a text file named chinese.txt using Windows Notepad. The file contains the three Chinese characters that can be translated to "I Love You" in English.
They are: 我爱你
I made sure to save the file in Notepad using UTF8 encoding.
Next, I transferred the file to the first node on my Vertica cluster.
The Linux file command can be used to verify that the file's encoding method is UTF8:
Code: Select all
[dbadmin@vertica01 ~]$ file chinese.txt
chinese.txt: UTF-8 Unicode text, with no line terminators
Code: Select all
[dbadmin@vertica01 ~]$ cat chinese.txt
æç±ä½
Code: Select all
dbadmin=> create table chinese (c1 varchar(100));
CREATE TABLE
dbadmin=> copy chinese from '/home/dbadmin/chinese.txt';
Rows Loaded
-------------
1
(1 row)
dbadmin=> select * from chinese;
c1
--------
æç±ä½
(1 row)
But the good news, the data in the table is fine.
If I use my dnVisualizer client to query the table, I see the Chinese characters just fine!
Note that I had to change the grid font in dbVisualizer to "Arial Unicode MS" to display the Chinese characters correctly...
The point is Vertica can store Chinese characters in UTF8 in a varchar data field
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: How to insert Chinese characters in Vertica
I never knew about it..
Thanks for sharing that knicely.
Thanks for sharing that knicely.
-
- Beginner
- Posts: 42
- Joined: Thu Apr 19, 2012 9:03 pm
Re: How to insert Chinese characters in Vertica
That's pretty cool! Jim, how do you save a file in Notepad as UTF8?
Re: How to insert Chinese characters in Vertica
Hi billy,
Even I am scratching my head on this.
I didn't see any option while saving.
Even I am scratching my head on this.
I didn't see any option while saving.