source: kdnuggets: sql window functions beyond basics: solving real business problems
level: technical
running totals track cumulative revenue over time. a common business need is showing both monthly revenue and the running sum in one output. the sum() over() window function with default framing adds all previous rows to the current row. an example from amazon calculates cumulative revenue by month after aggregating daily purchases. the inner query groups by month and filters out returns. the outer query applies the window function ordered by month to produce the running total.
gaps and islands identify streaks in sequential data. sessionization groups raw events into sessions where gaps between events stay below a timeout. the pattern uses lag() to compare each row to the previous one and flags new streak starts. a cumulative sum of flags creates streak ids. a linkedin and meta interview question finds the top three longest user visit streaks. the solution removes duplicates, flags streak boundaries, assigns streak ids, counts streak lengths, and ranks them to output users with the longest streaks.
cohort analysis groups users by a shared starting event like first purchase. window functions find the anchor date or attribute. first_value() attaches the first merchant to every order for each customer. a doordash question counts first-time orders per merchant. the query uses first_value() to label each order with the customer's first merchant, then joins back to count distinct customers and orders. percentile analysis uses percentile_cont() to find dynamic thresholds. a google and netflix fraud detection example computes the 95th percentile fraud score per state and flags claims above it.
why it matters: these patterns let data professionals solve common business problems directly in sql, avoiding extra tools and making analysis faster and more maintainable.
source: kdnuggets: sql window functions beyond basics: solving real business problems