I'm currently using copy command to load data into vertica.
I volumes of 1TB gziped data daily.
copy table (col1,col2,col3) FROM 1.dat GZIP DELIMITER '|'
My problem is that my data that goes into the columns may contain the delimiter '|' or any other asscii character.
From the documentation there is an option to use ESCAPE AS '~' which mean i will need to escape any data in column that may contain '|' and change it to '~|'
my only question is what happend if my data also has '~|' before i escape it? should i change it to '~~|' or any other alternative?
Would using ENCLOSED BY will help me with it? and what do i need to do if my data contains the ENCLOSED BY chatacter?
I prefer not to use the ENCLOSED BY as it will increase the data transfer of my files, but if that's the best solution i will reconsider.
example for raw data in my system:
(the number represents the column that data belongs to)
1|2|2|3
1|2|3
1|2~|2|3
1|2"2|3
||
|2|3
|2|
|2|3
1||
1||3
1|2|
1|2|3
In all of the above cases the original data contains 3 columns of raw data before escaping it.
Thanks for the help.
Koby
copy load data that may contain the delimiter
Moderator: NorbertKrupa
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: copy load data that may contain the delimiter
Hmm. I wonder if you could try something like this?
An alternate solution could be to load into a separate table and then copy the cleansed data over to your real table:
Code: Select all
dbadmin=> create table test (col1 varchar(10), col2 varchar(10), col3 varchar(10), col4 varchar(10)) segmented by hash(col1, col2, col3) all nodes;
CREATE TABLE
dbadmin=> copy test from '/home/dbadmin/test.txt' delimiter '|' trailing nullcols ;
Rows Loaded
-------------
11
(1 row)
dbadmin=> select * from test order by col1, col2;
col1 | col2 | col3 | col4
------+------+------+------
1 | 2 | 2 | 3
1 | 2 | 3 |
1 | 2 | |
1 | 2"2 | 3 |
1 | 2~ | 2 | 3
1 | | 3 |
1 | | |
| 2 | 3 |
| 2 | 3 |
| 2 | |
| | |
(11 rows)
dbadmin=> update test
dbadmin-> set col2 = (case when col4 is not null then col2 || '|' || col3 else col2 end),
dbadmin-> col3 = nvl2(col4, col4, col3);
OUTPUT
--------
11
(1 row)
dbadmin=> alter table test drop column col4 cascade;
ALTER TABLE
dbadmin=> select * from test order by col1, col2;
col1 | col2 | col3
------+------+------
1 | 2 | 3
1 | 2 |
1 | 2"2 | 3
1 | 2|2 | 3
1 | 2~|2 | 3
1 | | 3
1 | |
| 2 | 3
| 2 | 3
| 2 |
| |
(11 rows)
Code: Select all
dbadmin=> drop table test;
DROP TABLE
dbadmin=> create table test (col1 varchar(10), col2 varchar(10), col3 varchar(10)) segmented by hash(col1, col2, col3) all nodes;
CREATE TABLE
dbadmin=> create table test_load (col1 varchar(10), col2 varchar(10), col3 varchar(10), col4 varchar(10)) segmented by hash(col1, col2, col3) all nodes;
CREATE TABLE
dbadmin=> copy test_load from '/home/dbadmin/test.txt' delimiter '|' trailing nullcols ;
Rows Loaded
-------------
11
(1 row)
dbadmin=> insert /*+ direct */ into test
dbadmin-> select col1, (case when col4 is not null then decode(ascii(col2), null, col2 || col3, col2 || '|' || col3) else col2 end),
dbadmin-> nvl2(col4, col4, col3)
dbadmin-> from test_load;
OUTPUT
--------
11
(1 row)
dbadmin=> select * from test order by 1, 2;
col1 | col2 | col3
------+------+------
1 | 2 | 3
1 | 2 |
1 | 2"2 | 3
1 | 2|2 | 3
1 | 2~|2 | 3
1 | | 3
1 | |
| 2 | 3
| 2 | 3
| 2 |
| |
(11 rows)
dbadmin=> truncate table test_load;
TRUNCATE TABLE
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.