Grouping the output

These questions are all about grouping your data together.

To complete them you will need to know The following commands: nth_value, ntile, lag, lead,


Read on to Learn about Grouping

Lag and Lead

These functions allow you to examine the next or previous row. You may then compare this value to the current row

Consider a race where we wanted to see the time of the person in front of us and the amount of time we beat the person behind us by

 select name, time,
 lag(time, 1) over (order by time) as time_of_person_infront_of_me,
 lead(time, 1) over (order by time) - time as how_much_i_was_infront_of_ther_person_behind_me
 FROM runners order by time
 
nametimetime_of_person_infront_of_mehow_much_i_was_infront_of_ther_person_behind_me
andy101None2
bob1031011
cedric1041030
dave1041044
eric108104None

nth_value (& first_value) and ntile

Unsurprisngly nth_value will return the nth_value, but if we do not specify a range it will return null if the current value is less than the nth. If we always want something displayed we need to specify a range

ntile(n) divides the group into n equal partitions and denotes where partition each row is in.

Let us go back to our race and print how much faster a runner needs to go to finish on the podium (1st, 2nd or 3rd), the time of the second runner and finally we will use ntile(2) to determine if they are in the top half of runners

 select name, time,
 nth_value(time, 3) over (order by time) - time as to_go_faster_to_make_podium,
 nth_value(time, 2) over (order by time RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as time_of_second_runner,
 ntile(2) over (order by time) as which_half
 FROM runners order by time;
nametimeto_go_faster_to_make_podiumtime_of_second_runnerwhich_half
andy101None1031
bob103None1031
cedric10401031
dave10401032
eric108-41032