Let me start off by saying I can express my requirements using SQL... however it is slow. In MS SQL there existed the PIVOT function where you could define a grouping, join to a table with many records associated with the grouping and shift these values into columns.
I'll apologize for the poor setup, but its after 5 o clock and I'm tired.
Example:
Table = EmployeeAggregates
[1stRecord]
EmployeeID = 1
Lname = Johnson
Fname = Jack
ClockIn = 9:00 am
ClockOut = 5:00 pm
LunchTime = 1:00 pm
AverageBreakTime = 45 minutes
Table = EmployeeNonsense
[1stRecord]
EmployeeID = 1
Food = Banana
TimesEaten = 5
[2ndRecord]
EmployeeID = 1
Food = Apple
TimesEaten = 4
[3rdRecord]
EmployeeID = 1
Food = Pear
TimesEaten = 1
I need to present a record as:
EmployeeID , Fname , Lname , ClockIn , ClockOut , LunchTime , AverageBreakTime , Fav_Food1, Fav_Food2, Fav_Food3
Its very easy RANK the foods based on activity. No problem. Fast.
Sorting out the joins for hundreds of millions of users in a non-unique fashion to smash them into a single row is not. I could do this in a few minutes on MS SQL with a fair volume of records, and once I learned to use PIVOT it was very quick. Recreating this in Vertica is not easy. What's more is recreating it and accounting for the fact that some employees never eat food (I'm obviously masking data based on company rules here, Pear with me). I've come at it many different ways and would be curious to see how someone else might try to solve it since my current solutions are not performant enough for a production environment and I'm about to request a schema change to produce 2 files instead of 1 (which is of course practically instant in Vertica).
Thanks in advance!
Crosstab or Pivot Function
Moderator: NorbertKrupa
Re: Crosstab or Pivot Function
Forgot about some code I wrote a long time ago on MS SQL to perform the crosstab before I learned about the keyword. Works quickly and sets us up to use a left join on a subquery so I think I sorted this out. Here's the code, masked again of course so forgive me if it doesn't tie back to my previous example perfectly:
(select EmployeeID,
max((CASE when RNK = 1 then Food end)) as Food1,
MAX((CASE when RNK = 2 then Food end)) as Food2,
MAX((CASE when RNK = 3 then Food end)) as Food3
from
(select EmployeeID, Food RANK() OVER (Partition by EmployeeID order by TimesEaten desc) RNK
from EmployeeNonsense order by 1) ranks
group by EmployeeID order by 1)
Its important to note that this doesn't work unless you use an aggregate function on the 'Food' column since if you group by EmployeeID and Food your rows won't clump together based on EmployeeID... so since I'm evaluating a food (varchar) I opted for Max... but I knew there was only going to be 1 with a Rank 1 of so Max or Min would work... with a numeric field you could even sum it and not hurt your output...
I guess this is kind of a trick, but it is accurate, effective, and swift.
Hope this helps someone else in the future.
(select EmployeeID,
max((CASE when RNK = 1 then Food end)) as Food1,
MAX((CASE when RNK = 2 then Food end)) as Food2,
MAX((CASE when RNK = 3 then Food end)) as Food3
from
(select EmployeeID, Food RANK() OVER (Partition by EmployeeID order by TimesEaten desc) RNK
from EmployeeNonsense order by 1) ranks
group by EmployeeID order by 1)
Its important to note that this doesn't work unless you use an aggregate function on the 'Food' column since if you group by EmployeeID and Food your rows won't clump together based on EmployeeID... so since I'm evaluating a food (varchar) I opted for Max... but I knew there was only going to be 1 with a Rank 1 of so Max or Min would work... with a numeric field you could even sum it and not hurt your output...
I guess this is kind of a trick, but it is accurate, effective, and swift.
Hope this helps someone else in the future.
Re: Crosstab or Pivot Function
HI!
Code: Select all
dbadmin=> select * from EmployeeAggregates;
EmployeeID | Lname | Fname | ClockIn | ClockOut | LunchTime | AverageBreakTime
------------+---------+-------+----------+----------+-----------+------------------
1 | Johnson | Jack | 09:00:00 | 17:00:00 | 13:00:00 | 45
2 | Smith | John | 09:00:00 | 17:00:00 | 13:00:00 | 30
(2 rows)
Code: Select all
dbadmin=> select * from EmployeeNonsense;
EmployeeID | Food | TimesEaten
------------+--------+------------
1 | Apple | 4
1 | Banana | 5
1 | Pear | 1
2 | Apple | 2
2 | Mango | 3
2 | Pear | 4
(6 rows)
Code: Select all
dbadmin=> \p
select a.*,
b.f1 as Fav1,
b.f2 as Fav2
from EmployeeAggregates as a inner join (
select rank() over (partition by EmployeeID order by TimesEaten Desc) as rank,
EmployeeID,
Food as f1,
lead(Food, 1) OVER (PARTITION BY EmployeeID ORDER BY TimesEaten Desc) as f2
from EmployeeNonsense) as b
on a.EmployeeID = b.EmployeeID
and b.rank = 1;
Code: Select all
dbadmin=> \g
EmployeeID | Lname | Fname | ClockIn | ClockOut | LunchTime | AverageBreakTime | Fav1 | Fav2
------------+---------+-------+----------+----------+-----------+------------------+--------+-------
1 | Johnson | Jack | 09:00:00 | 17:00:00 | 13:00:00 | 45 | Banana | Apple
2 | Smith | John | 09:00:00 | 17:00:00 | 13:00:00 | 30 | Pear | Mango
(2 rows)