Performance of the load_streams System Table

Moderator: NorbertKrupa

Post Reply
harryrundles
Intermediate
Intermediate
Posts: 96
Joined: Thu Jul 19, 2012 12:33 am

Performance of the load_streams System Table

Post by harryrundles » Wed Oct 03, 2012 4:17 pm

Hi guys,

Anyone know why querying the load_streams system table is so slow? Is there a way to speed it up? For instance, it takes > 10 seconds to get results while I'm loading a table!

For instance:

Code: Select all

dbadmin=> SELECT table_name, accepted_row_count, rejected_row_count FROM load_streams;
-[ RECORD 1 ]------+--------------------------
table_name         | store_billing_line_item
accepted_row_count | 65625600
rejected_row_count | 0

Time: First fetch (1 row): 10378.107 ms. All rows formatted: 10378.149 ms
Thanks,
Harry

User avatar
Julie
Master
Master
Posts: 221
Joined: Thu Apr 19, 2012 9:29 pm

Re: Performance of the load_streams System Table

Post by Julie » Wed Oct 03, 2012 6:06 pm

Hi Harry,

I have that same question! It doesn't seem to matter how many tables we're loading, it always takes about 10 seconds to get a response back when querying the load_streams table. People here refer to it as the "lazy" streams table :P
Thanks,
Juliette

jpcavanaugh
Intermediate
Intermediate
Posts: 149
Joined: Mon Apr 30, 2012 10:04 pm
Location: New York
Contact:

Re: Performance of the load_streams System Table

Post by jpcavanaugh » Wed Oct 03, 2012 9:06 pm

Looks like its a known issue for Vertica (which is good).

harryrundles
Intermediate
Intermediate
Posts: 96
Joined: Thu Jul 19, 2012 12:33 am

Re: Performance of the load_streams System Table

Post by harryrundles » Thu Oct 04, 2012 7:36 pm

Cool! So this is something that will be fixed in a future release? I hope that the response time will be almost real time so that we could write a small app to display the row counts as the copy command runs.
Thanks,
Harry

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Performance of the load_streams System Table

Post by JimKnicely » Fri Oct 05, 2012 1:13 pm

Hi Harry! I was planning to do that too (query the the LOAD_STREAMS table to create a nice little interface for developers who are loading data). I'm glad to hear that the performance of this table will be fixed in the future. Thanks for asking the question!
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

alamb
Newbie
Newbie
Posts: 7
Joined: Tue Jul 31, 2012 6:22 pm

Re: Performance of the load_streams System Table

Post by alamb » Sat Oct 06, 2012 1:31 pm

Here is some engineering perspective:

The load_streams table (now) contains data on both ongoing loads and historical loads -- but the underlying mechanism for the historical data is not particularly optimized and can be big.

on 6.0, If you want a faster way to get access to the information in load_streams on currently running load streams (no loads that have finished), you can try the following query (which is the definition of v_monitor.load_streams with all references to v_monitor.execution_engine_profiles changed to vs_execution_engine_profiles)

Your mileage will vary -- this query will not be officially supported, etc.

Code: Select all

-- Load metrics for each load stream on each node. 
-- input_size_file_bytes & parse_complete should be NULL if loading from STDIN or FIFO. 
-- However, sum() returns an int if load is from say files and some FIFOs. To get around this problem, the sub-query returns total_rows & not_null rows. 
-- If they arent equal, then one of the sources is a FIFO and NULL is chosen instead of sum(input_file_size_bytes)
select session_id,
       table_accepted_row_count.transaction_id,
       table_accepted_row_count.statement_id,
       stream_name, schema_name,
       table_name,
       load_start,
       load_duration_ms,
       is_executing,
       accepted_row_count,
       rejected_row_count,
       read_bytes,
       case when total_rows = num_not_nulls then input_size
	   		else null end as input_file_size_bytes,
       case when total_rows = num_not_nulls and input_size > 0 then read_bytes * 100 // input_size
            else null end as parse_complete_percent,
       unsorted_row_count,
       sorted_row_count,
       (case when unsorted_row_count > 0 then sorted_row_count * 100 // unsorted_row_count else null end) as sort_complete_percent
from
    (select transaction_id, statement_id, session_id, identifier as stream_name,
            query_start as load_start,
            query_duration_us // 1000 as load_duration_ms,
            is_executing
     from v_monitor.query_profiles
     where query_type = 'LOAD' and (error_code = 0 or error_code is null)) as query_profiles
natural join
    (select transaction_id, statement_id, sum(counter_value) as accepted_row_count
     from vs_execution_engine_profiles
     where operator_name = 'Load' and counter_name = 'rows produced'
     group by transaction_id, statement_id) as table_accepted_row_count
natural join
    (select distinct transaction_id, statement_id, table_schema as schema_name, table_name
     from v_internal.dc_projections_used) as table_schema_name
natural join
    (select transaction_id, statement_id, sum(counter_value) as rejected_row_count
     from vs_execution_engine_profiles
     where operator_name = 'Load' and counter_name = 'rows rejected'
     group by transaction_id, statement_id) as table_rejected_row_count
natural join
    (select transaction_id, statement_id, count(*) as total_rows, 
            count(counter_value) as num_not_nulls, sum(counter_value) as input_size
      from vs_execution_engine_profiles
      where (operator_name = 'Load' or operator_name = 'LoadUnion') and counter_tag = 'main' and counter_name like 'input size%'
      group by transaction_id, statement_id) as table_input_file_size
natural join
    (select transaction_id, statement_id, sum(counter_value) as read_bytes
     from vs_execution_engine_profiles
     where operator_name = 'Load' and counter_name like 'read %'
     group by 1,2) as table_read_bytes
natural join
    (select transaction_id, statement_id, sum(counter_value) as unsorted_row_count
     from vs_execution_engine_profiles
     where operator_name = 'DataTarget' and counter_name = 'input rows'
     group by 1,2) as table_unsorted_row_count
natural join
    (select transaction_id, statement_id, sum(counter_value) as sorted_row_count
     from vs_execution_engine_profiles
     where operator_name = 'DataTarget' and counter_name = 'written rows'
     group by 1,2) as table_sorted_row_count
order by is_executing desc, load_start desc
;

jpcavanaugh
Intermediate
Intermediate
Posts: 149
Joined: Mon Apr 30, 2012 10:04 pm
Location: New York
Contact:

Re: Performance of the load_streams System Table

Post by jpcavanaugh » Sat Oct 06, 2012 6:03 pm

Thanks Andrew!!

Post Reply

Return to “Vertica Data Load”