Loading data into Vertica database from a file

Moderator: NorbertKrupa

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Loading data into Vertica database from a file

Post by id10t » Sat Jun 08, 2013 12:25 pm

Hi!

Code: Select all

CREATE TABLE public.apache_log
(
    eip char(15),
    uip char(15),
    datetime timestamp,
    verb varchar(8),
    dt varchar(256),
    df varchar(256),
    url varchar(256),
    http_version varchar(36),
    ref_url varchar(256),
    status int,
    cl int,
    bytes int,
    dur int,
    cache int,
    ua varchar(256)
);
Your string:

Code: Select all

daniel@synapse:~$ cat /tmp/apache.log 
213.4.28.244 - 95.123.101.114 - - [05/Jun/2013:08:27:47] "GET /X/0/crossdomain.xml HTTP/1.1 http://democdn.biz.tm/player-5.1.897.swf" 200 307 307 0 0 "Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firefox/21.0"
Copy statement:

Code: Select all

copy apache_log
(
    record            filler varchar(65000),
    eip            as split_part(record,' ', 1),
    uip            as split_part(record,' - ', 2),
    datetime       as to_timestamp(regexp_substr(record, '\[([^]]+)]', 1,1,'',1), 'DD/Mon/YYYY:HH:MI:SS'),
    verb           as regexp_substr(record, '"([^\s]+)',1,1,'',1),
    dt             as regexp_substr(record, '"\w+\s(/[^/]+)',1,1,'',1),
    df             as regexp_substr(record, '"\w+\s(/[^/]+)(/[^/]+)',1,1,'',2),
    url            as regexp_substr(record, '"\w+\s(/[^/]+)(/[^/]+)(/[^/]+)\s',1,1,'',3),
    http_version   as regexp_substr(record, '"\w+\s[^\s]+\s([^\s]+)',1,1,'',1),
    ref_url        as regexp_substr(record, 'http://[^"]+'),
    status         as regexp_substr(record, '(?<=\s)\d+(?=\s)', 1,1)::int,
    cl             as regexp_substr(record, '(?<=\s)\d+(?=\s)', 1,2)::int,
    bytes          as regexp_substr(record, '(?<=\s)\d+(?=\s)', 1,3)::int,
    dur            as regexp_substr(record, '(?<=\s)\d+(?=\s)', 1,4)::int,
    cache          as regexp_substr(record, '(?<=\s)\d+(?=\s)', 1,5)::int,
    ua             as regexp_substr(record, '"[^"]+"$')
)
from '/tmp/apache.log' direct abort on error;
Test:

Code: Select all

daniel=> \e
 Rows Loaded 
-------------
           1
(1 row)
daniel=> \x
Expanded display is on.
daniel=> select * from apache_log ;
-[ RECORD 1 ]+--------------------------------------------------------------------
eip          | 213.4.28.244   
uip          | 95.123.101.114 
datetime     | 2013-06-05 08:27:47
verb         | GET
dt           | /X
df           | /0
url          | /crossdomain.xml
http_version | HTTP/1.1
ref_url      | http://democdn.biz.tm/player-5.1.897.swf
status       | 200
cl           | 307
bytes        | 307
dur          | 0
cache        | 0
ua           | "Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firefox/21.0"

ssrao
Newbie
Newbie
Posts: 14
Joined: Fri Jun 07, 2013 2:37 pm

Re: Loading data into Vertica database from a file

Post by ssrao » Sat Jun 08, 2013 2:31 pm

Thanks sKwa.
I am working on apache parser UDF and it will be useful for parse those records and loading data into table from file directly...
Can't we parse and load data into table using copy command for above case?

ssrao
Newbie
Newbie
Posts: 14
Joined: Fri Jun 07, 2013 2:37 pm

Re: Loading data into Vertica database from a file

Post by ssrao » Sat Jun 08, 2013 2:34 pm

Thanks...

Just now I have seen your output..

ssrao
Newbie
Newbie
Posts: 14
Joined: Fri Jun 07, 2013 2:37 pm

Re: Loading data into Vertica database from a file

Post by ssrao » Sat Jun 08, 2013 3:27 pm

19:22:25 [COPY - 0 row(s), 0.000 secs] [Error Code: 4800, SQL State: 22001] [Vertica][VJDBC](4800) ERROR: String of 2 octets is too long for type Char(1)
still I am facing above errors for the above example because your table structure and my table structure are different.
Please find the below table
create table Inbound_access_log1
(Inbound_Key Identity(1,1),
EndPoint_IP_Address VARCHAR(30),
Client_IP_Address VARCHAR(30),
Request_Time TIMESTAMP,
HTTP_Request_Method VARCHAR(10),
Delivery_Type CHAR,
Delivery_Format CHAR,
Bucket_ID INTEGER,
Requested_URL VARCHAR(500),
Asset_Name VARCHAR(50),
HTTP_Version VARCHAR(20),
Referer_URL VARCHAR(500),
HTTP_Status_Code INTEGER,
Content_Length INTEGER,
Bytes_Transfered INTEGER,
Duration INTEGER,
isFromCache CHAR(2),
User_Agent VARCHAR(250),
PRIMARY KEY(Inbound_Key));
Examples of the input files are
-------------------------------------
213.4.28.244 - 95.123.101.114 - - [05/Jun/2013:08:27:50] "GET /V/0/_15753/QA_B102108Z1369846463.mp4?token=1401382686-9fe88327fc34caefeb4d41e3a3311fa7&start=0 HTTP/1.1 http://democdn.biz.tm/player-5.1.897.swf" 200 657190 657190 2 0 "Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firefox/21.0"

Output
--------
EndPoint_IP_Address 213.4.28.244
Client_IP_Address 95.123.101.114
Request_Time 05/Jun/2013:08:27:50
HTTP_Request_Method GET
Delivery_Type V
Delivery_Format 0
Bucket_ID _15753
Requested_URL QA_B102108Z1369846463.mp4?token=1401382686-9fe88327fc34caefeb4d41e3a3311fa7&start=0
Asset_Name QA_B102108Z1369846463.mp4
HTTP_Version HTTP/1.1
Referer_URL http://democdn.biz.tm/player-5.1.897.swf
HTTP_Status_Code 200
Content_Length 657190
Bytes_Transfered 657190
Duration 2
isFromCache 0
User_Agent "Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firefox/21.0"

Points to remember
1.if the respective field is not having any value in the later files then that particular field output value should placed as null
2.If any record is not loaded into table,then that record should goes to rejected file which will place in the directory '/home/cdn'
3.How to load all log files at one time using copy command

Thanks in Advance.

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Loading data into Vertica database from a file

Post by id10t » Sat Jun 08, 2013 7:45 pm

Hi!

Since you are loading data via Java (...[Vertica][VJDBC](4800) ERROR...), so better if you will parse it with Java (it will be much more easer and faster).
Internet is full in examples how to parse Apache log.

ssrao
Newbie
Newbie
Posts: 14
Joined: Fri Jun 07, 2013 2:37 pm

Re: Loading data into Vertica database from a file

Post by ssrao » Sun Jun 09, 2013 5:05 am

Can any one give correct copy command to get above output for above table?

bwest@sdsc.edu
Newbie
Newbie
Posts: 3
Joined: Mon Jan 27, 2014 7:20 pm

Re: Loading data into Vertica database from a file

Post by bwest@sdsc.edu » Mon Jan 27, 2014 7:40 pm

I am looking for assistance on the same thing. I am attempting to load data from tab delimited files into flex tables. When I run the copy command I always receive the following error:

ERROR 4800: String of 2 octets is too long for type Varchar(1)

This is regardless of the file structure. Two questions, 1. What does the error mean exactly ( I am assuming it is trying to put a 2 byte value in a 1 byte slot but that doesn't make sense as far as my understanding of flex tables are concerned). 2. Is there anyway to pre format the data to avoid this error. I am using Vertica 7. The copy command I am using is:

COPY staging.subject_demographics from'/home/dbadmin/NA12877_Demographic_data.txt' parser fdelimitedparser(delimiter='\t');

I don't believe there is a syntax error but I am a new user and there really aren't many examples out there.

Thanks,

Post Reply

Return to “Vertica Data Load”