Page 1 of 1

MAPLOOKUP / MapJSONExtractor relative slowness

Posted: Fri Nov 09, 2018 3:07 pm
by _pj_
Hi, All

We have the case that Json is pushed from outer system to our Vertica database into tables with longvarchar columns. We do not have any possibility to change this process currently.

In general the Jsons are simple linear ones, so we could in theory just parse those with MAPLOOKUP/MapJSONExtractor like that:

Code: Select all

SELECT
    public.MAPLOOKUP(public.MapJSONExtractor(xJSON), 'id') AS id
  , public.MAPLOOKUP(public.MapJSONExtractor(xJSON), 'value') AS value
FROM /*EXAMPLE_DATA*/
    (
     SELECT
            '{"id":1,"value":1000}' AS xJSON
  UNION ALL
     SELECT
            '{"id":2,"value":2000}' AS xJSON ) EXAMPLE_DATA
WHERE
    1 = 1
/*
id	value
1	1000
2	2000
*/
The problem is that MapJSONExtractor seams really slow with high volumes.

The throughput we get:
1. Select the data out to our ETL tool (Pentaho Data Integration) and parse the JSON there and write back to database 10million records in 4 minutes.
2. Taking some risks (casting it to varchar so string functions can be applied, and then parsing the JSON with string functions like SPLIT_PART etc) then we get 10 million rows parsed in 2 minutes.
3. With MapJSONExtractor the 10 million rows take 6 minutes to be parsed.

So with billions of rows the difference is significant.

We do not like to take the risks of handling the JSON as string and parsing with string functions, but as it is so much faster then we really do not see an better option at the moment than to go with option 2. string parsing. The same example than above with SPLIT_PART:

Code: Select all

SELECT
    SPLIT_PART(SPLIT_PART(xJSON, '"id":',2),',',1) AS id
  , SPLIT_PART(SPLIT_PART(SPLIT_PART(xJSON, '"value":',2),',',1),'}',1) AS value
FROM /*EXAMPLE_DATA*/
    (
     SELECT
            '{"id":1,"value":1000}' AS xJSON
  UNION ALL
     SELECT
            '{"id":2,"value":2000}' AS xJSON ) EXAMPLE_DATA
WHERE
    1 = 1
/*
id	value
1	1000
2	2000
*/
So does anybody know a better option for this? Is there some way to make the MapJSONExtractor faster? Doesn't this sound like a bug in MapJSONExtractor if it is so much slower?

Br,
pj