UDF CsvParser can't load Date
Moderator: NorbertKrupa
UDF CsvParser can't load Date
I am trying to use TraditionalCsvParser from Vertica's SDK examples and have trouble loading data with a date in it. Other entries work fine. Has anyone experienced this problem? How to fix it?
Re: UDF CsvParser can't load Date
Hi!
Can you post a log of compilation? If there are any errors or warnings? Did you install libboost-dev?
Because I have no this problem:
My compilation log:
Can you post a log of compilation? If there are any errors or warnings? Did you install libboost-dev?
Because I have no this problem:
Code: Select all
daniel=> CREATE LIBRARY TraditionalCsvParserLib AS '/opt/vertica/sdk/examples/ParserFunctions/CsvParser.so';
CREATE LIBRARY
daniel=> CREATE PARSER CSVParser AS LANGUAGE 'C++' NAME 'CsvParserFactory' LIBRARY TraditionalCsvParserLib;
CREATE PARSER FUNCTION
daniel=> create table csv (d date, i int, v varchar);
CREATE TABLE
Code: Select all
daniel=> copy csv from stdin with parser CsvParser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 2000-01-01,666,a
>> \.
daniel=> select * from csv;
d | i | v
------------+-----+---
2000-01-01 | 666 | a
(1 row)
Code: Select all
daniel@synapse:~$ g++ -D HAVE_LONG_INT_64 \
> -I /opt/vertica/sdk/include \
> -I /opt/vertica/sdk/examples/HelperLibraries \
> -I /usr/include/boost \
> -Wall -shared -Wno-unused-value -fPIC \
> -o CsvParser.so /opt/vertica/sdk/examples/ParserFunctions/TraditionalCsvParser.cpp \
> /opt/vertica/sdk/include/Vertica.cpp
In file included from /opt/vertica/sdk/include/VerticaUDx.h:57:0,
from /opt/vertica/sdk/include/Vertica.h:76,
from /opt/vertica/sdk/examples/ParserFunctions/CoroutineHelpers.h:13,
from /opt/vertica/sdk/examples/ParserFunctions/ContinuousUDParser.h:11,
from /opt/vertica/sdk/examples/ParserFunctions/TraditionalCsvParser.cpp:3:
/opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]
In file included from /opt/vertica/sdk/include/VerticaUDx.h:57:0,
from /opt/vertica/sdk/include/Vertica.h:76,
from /opt/vertica/sdk/include/Vertica.cpp:38:
/opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]
Re: UDF CsvParser can't load Date
Thanks for your reply. I do not have libboost-dev installed, but I don't get any warnings. What is libboost-dev?
Here is my compilation log.
g++ -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -I/opt/vertica/sdk/examples/HelperLibraries -Wall -shared -Wno-unused-value -fPIC -o CsvParser.so: examples/ParserFunctions/TraditionalCsvParser.cpp /opt/vertica/sdk/include/Vertica.cpp
CREATE LIBRARY CsvParserLib AS '/home/dbadmin/CsvParser.so';
CREATE PARSER CsvParser AS LANGUAGE 'C++' NAME ‘CsvParserFactory' LIBRARY CsvParserLib;
CREATE TABLE GL.GL_BALANCES
( LEDGER_ID NUMBER(15,0) NOT NULL,
CODE_COMBINATION_ID NUMBER(15,0) NOT NULL,
CURRENCY_CODE VARCHAR2(15) NOT NULL,
PERIOD_NAME VARCHAR2(15) NOT NULL,
ACTUAL_FLAG VARCHAR2(1) NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER(15,0) NOT NULL,
BUDGET_VERSION_ID NUMBER(15,0),
ENCUMBRANCE_TYPE_ID NUMBER(15,0),
TRANSLATED_FLAG VARCHAR2(1)
);
copy GL.GL_BALANCES from '/home/dbadmin/Data/GLtest.csv' with parser CsvParser() no commit;
GLtest.csv >
289,110052,GBP,APR-01/02,B,1-Oct-03,1001406,1637,,
289,110053,GBP,APR-01/02,A,1-Oct-03,1001406,,,
.....
On a different note, the above compilation creates file *.so: instead of *.so and I have to rename it to *.so before creating library. If I put a space between .so and :, I get error. What am I doing wrong?
Here is my compilation log.
g++ -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -I/opt/vertica/sdk/examples/HelperLibraries -Wall -shared -Wno-unused-value -fPIC -o CsvParser.so: examples/ParserFunctions/TraditionalCsvParser.cpp /opt/vertica/sdk/include/Vertica.cpp
CREATE LIBRARY CsvParserLib AS '/home/dbadmin/CsvParser.so';
CREATE PARSER CsvParser AS LANGUAGE 'C++' NAME ‘CsvParserFactory' LIBRARY CsvParserLib;
CREATE TABLE GL.GL_BALANCES
( LEDGER_ID NUMBER(15,0) NOT NULL,
CODE_COMBINATION_ID NUMBER(15,0) NOT NULL,
CURRENCY_CODE VARCHAR2(15) NOT NULL,
PERIOD_NAME VARCHAR2(15) NOT NULL,
ACTUAL_FLAG VARCHAR2(1) NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER(15,0) NOT NULL,
BUDGET_VERSION_ID NUMBER(15,0),
ENCUMBRANCE_TYPE_ID NUMBER(15,0),
TRANSLATED_FLAG VARCHAR2(1)
);
copy GL.GL_BALANCES from '/home/dbadmin/Data/GLtest.csv' with parser CsvParser() no commit;
GLtest.csv >
289,110052,GBP,APR-01/02,B,1-Oct-03,1001406,1637,,
289,110053,GBP,APR-01/02,A,1-Oct-03,1001406,,,
.....
On a different note, the above compilation creates file *.so: instead of *.so and I have to rename it to *.so before creating library. If I put a space between .so and :, I get error. What am I doing wrong?
Re: UDF CsvParser can't load Date
Hi!
hm... now I understand.
I think its a bug:
Lets check formats:
Now parser:
No errors
Now will try to format our date:
hm... ok... will try column option:
and now error
>> What is libboost-dev?
Without header tokenizer.hpp it wont compile. May be on your distro it called little different.
>> On a different note, the above compilation creates file *.so: instead of *.so ...
A name of a lib doesn't matter. The best way to verify that everything is ok, that all linked:
You must resolve any "not found" or "none"
>> What am I doing wrong?
Why do you have ':' in command line? Did you copy it from make file? I think its come from make file syntax, you dont need it in the command line.
hm... now I understand.
I think its a bug:
Lets check formats:
Code: Select all
daniel=> select '2018-01-21'::date;
?column?
------------
2018-01-21
(1 row)
Code: Select all
daniel=> select to_date('2018-01-21', 'yyyy-mm-dd');
to_date
------------
2018-01-21
(1 row)
Code: Select all
daniel=> copy GL_BALANCES from stdin with parser CsvParser() abort on error ;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 289,110052,GBP,APR-01/02,B,2018-01-21,1001406,1637,,
>> \.
Now will try to format our date:
Code: Select all
daniel=> copy GL_BALANCES from stdin with parser CsvParser(format='d-Mon-YY') abort on error ;
ERROR 0: Error calling prepare() in User Function CsvParserFactory at [./TraditionalCsvParser.cpp:372], error code: 0, message: Parameter format can only be used as a column option
Code: Select all
daniel=> copy GL_BALANCES column option (LAST_UPDATE_DATE format 'yyyy-mm-dd') from stdin with parser CsvParser() abort on error ;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 289,110052,GBP,APR-01/02,B,2018-01-21,1001406,1637,,
>> \.
ERROR 2035: COPY: Input record 1 has been rejected (Invalid column: name=LAST_UPDATE_DATE, index=6, type=Date(-1), value=2018-01-21)
>> What is libboost-dev?
Without header tokenizer.hpp it wont compile. May be on your distro it called little different.
The Vertica SDK provides example parser functions in /opt/vertica/sdk/examples/ParserFunctions. Included in the examples are two CSV parsers; "Traditional CSV" and an RFC 4180 CSV parser. The traditional CSV parser uses the boost::tokenizer library to read the CSV output from common programs such as Microsoft Excel. The RFC 4180 parser parses CSV files written to the RFC 4180 standard and uses libcsv.
Code: Select all
cat TraditionalCsvParser.cpp
/* Copyright (c) 2005 - 2012 Vertica, an HP company -*- C++ -*- */
#include "ContinuousUDParser.h"
#include "StringParsers.h"
#include <pthread.h>
#include <boost/tokenizer.hpp> //<-- you need boost for this header
...
A name of a lib doesn't matter. The best way to verify that everything is ok, that all linked:
Code: Select all
$ ldd CsvParser.so
linux-vdso.so.1 => (0x00007fffa4bc4000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f92fd52a000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f92fd314000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f92fcf54000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f92fcc58000)
/lib64/ld-linux-x86-64.so.2 (0x00007f92fdb20000)
>> What am I doing wrong?
Why do you have ':' in command line? Did you copy it from make file? I think its come from make file syntax, you dont need it in the command line.
Re: UDF CsvParser can't load Date
Thanks for all your help. Let me see if I can figure out where the bug is in the code and try to fix it.
Re: UDF CsvParser can't load Date
Need help with CSVParser - If I change the format of the date to 10/1/2013, it works fine for small loads, but for bigger loads, it fails at last column. This is the error it logs in error.log.
COPY: Input record 1 has been rejected (Invalid column: name=PROJECT_TO_DATE_CR_BEQ, index=37, length=1,type=Numeric(38,0), value=
.). Please see /home/dbadmin/VMartDB/v_vmartdb_node0001_catalog/CopyErrorLogs/GL_BALANCES-GL2.csv-copy-from-rejected-data, record 1 for the rejected record.
Any idea what is going wrong here and why the date format 01-Oct-03 does not work but 10/1/2003 does?
COPY: Input record 1 has been rejected (Invalid column: name=PROJECT_TO_DATE_CR_BEQ, index=37, length=1,type=Numeric(38,0), value=
.). Please see /home/dbadmin/VMartDB/v_vmartdb_node0001_catalog/CopyErrorLogs/GL_BALANCES-GL2.csv-copy-from-rejected-data, record 1 for the rejected record.
Any idea what is going wrong here and why the date format 01-Oct-03 does not work but 10/1/2003 does?
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: UDF CsvParser can't load Date
What is your delimiter? The error is actually saying you are referencing an invalid column name...
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.