Crosstab or Pivot Function

ncreekmur · Post by **ncreekmur** » Sat Feb 23, 2013 12:12 am

Let me start off by saying I can express my requirements using SQL... however it is slow. In MS SQL there existed the PIVOT function where you could define a grouping, join to a table with many records associated with the grouping and shift these values into columns.

I'll apologize for the poor setup, but its after 5 o clock and I'm tired.

Example:

Table = EmployeeAggregates

[1stRecord]
EmployeeID = 1
Lname = Johnson
Fname = Jack
ClockIn = 9:00 am
ClockOut = 5:00 pm
LunchTime = 1:00 pm
AverageBreakTime = 45 minutes

Table = EmployeeNonsense

[1stRecord]
EmployeeID = 1
Food = Banana
TimesEaten = 5

[2ndRecord]
EmployeeID = 1
Food = Apple
TimesEaten = 4

[3rdRecord]
EmployeeID = 1
Food = Pear
TimesEaten = 1

I need to present a record as:
EmployeeID , Fname , Lname , ClockIn , ClockOut , LunchTime , AverageBreakTime , Fav_Food1, Fav_Food2, Fav_Food3

Its very easy RANK the foods based on activity. No problem. Fast.

Sorting out the joins for hundreds of millions of users in a non-unique fashion to smash them into a single row is not. I could do this in a few minutes on MS SQL with a fair volume of records, and once I learned to use PIVOT it was very quick. Recreating this in Vertica is not easy. What's more is recreating it and accounting for the fact that some employees never eat food (I'm obviously masking data based on company rules here, Pear with me). I've come at it many different ways and would be curious to see how someone else might try to solve it since my current solutions are not performant enough for a production environment and I'm about to request a schema change to produce 2 files instead of 1 (which is of course practically instant in Vertica).

Thanks in advance!

ncreekmur · Post by **ncreekmur** » Mon Feb 25, 2013 5:12 pm

Forgot about some code I wrote a long time ago on MS SQL to perform the crosstab before I learned about the keyword. Works quickly and sets us up to use a left join on a subquery so I think I sorted this out. Here's the code, masked again of course so forgive me if it doesn't tie back to my previous example perfectly:

(select EmployeeID,
max((CASE when RNK = 1 then Food end)) as Food1,
MAX((CASE when RNK = 2 then Food end)) as Food2,
MAX((CASE when RNK = 3 then Food end)) as Food3
from
(select EmployeeID, Food RANK() OVER (Partition by EmployeeID order by TimesEaten desc) RNK
from EmployeeNonsense order by 1) ranks
group by EmployeeID order by 1)

Its important to note that this doesn't work unless you use an aggregate function on the 'Food' column since if you group by EmployeeID and Food your rows won't clump together based on EmployeeID... so since I'm evaluating a food (varchar) I opted for Max... but I knew there was only going to be 1 with a Rank 1 of so Max or Min would work... with a numeric field you could even sum it and not hurt your output...

I guess this is kind of a trick, but it is accurate, effective, and swift.

Hope this helps someone else in the future.

id10t · Post by **id10t** » Fri Mar 01, 2013 10:46 pm

HI!

Code: Select all

dbadmin=> select * from EmployeeAggregates;
 EmployeeID |  Lname  | Fname | ClockIn  | ClockOut | LunchTime | AverageBreakTime 
------------+---------+-------+----------+----------+-----------+------------------
          1 | Johnson | Jack  | 09:00:00 | 17:00:00 | 13:00:00  | 45
          2 | Smith   | John  | 09:00:00 | 17:00:00 | 13:00:00  | 30
(2 rows)

Code: Select all

dbadmin=> select * from EmployeeNonsense;
 EmployeeID |  Food  | TimesEaten 
------------+--------+------------
          1 | Apple  |          4
          1 | Banana |          5
          1 | Pear   |          1
          2 | Apple  |          2
          2 | Mango  |          3
          2 | Pear   |          4
(6 rows)

Code: Select all

dbadmin=> \p
select a.*,
       b.f1 as Fav1,
       b.f2 as Fav2
  from EmployeeAggregates as a inner join (
select rank() over (partition by EmployeeID order by TimesEaten Desc) as rank,
       EmployeeID,
       Food as f1,
       lead(Food, 1) OVER (PARTITION BY EmployeeID ORDER BY TimesEaten Desc) as f2
  from EmployeeNonsense) as b
    on a.EmployeeID = b.EmployeeID
       and b.rank = 1;

Code: Select all

dbadmin=> \g
 EmployeeID |  Lname  | Fname | ClockIn  | ClockOut | LunchTime | AverageBreakTime |  Fav1  | Fav2  
------------+---------+-------+----------+----------+-----------+------------------+--------+-------
          1 | Johnson | Jack  | 09:00:00 | 17:00:00 | 13:00:00  | 45               | Banana | Apple
          2 | Smith   | John  | 09:00:00 | 17:00:00 | 13:00:00  | 30               | Pear   | Mango
(2 rows)

Crosstab or Pivot Function

Crosstab or Pivot Function

Re: Crosstab or Pivot Function

Re: Crosstab or Pivot Function