Ch4. 매출을 파악하기 위한 데이터 추출

시계열 & 다면적 축 이용

9강. 시계열 기반으로 데이터 집계하기

sum, avg 같은 통계를 누적 할 때 쓰는 쿼리문 over(partition by 구분 칼럼명 order by 기준 칼럼명 rows 숫 unbounded preceding and current row) 구분되는 칼럼이 없으면 partition by 생략 가능
LAG : 일정 기간 이전의 값을 추출할 때 쓰는 쿼리문 보통 12개월 전이라고 하면, LAG(monthly, 12) OVER (ORDER BY year,month) --year, month, monthly 다 칼럼명--

최대 7일동안의 평균 vs 최근 7일 동안의 평균

select 
	  dt,
    sum(purchase_amount) as total_amount,
    avg(sum(purchase_amount)) over (order by dt rows between 6 preceding and current row) as seven_day_avg,
    case when 7=count(*) over (order by dt rows between 6 preceding and current row) 
         then avg(sum(purchase_amount)) over (order by dt rows between 6 preceding and current row) 
         end as seven_day_avg_strict

from `purchase_log`

group by dt
order by dt ; 

-- seven_day_avg_amount는 7일이 되지 않아도 평균을 내지만, 
--<case when 7=count(*)> 때문에 seven_day_avg_amount_strict는 7일이 되지 않으면 avg를 하지 않는다.

Z차트 '월차매출', '매출누계', '이동누계' 3개의 지표로 구성되어, 계절 변동의 영향을 배제하고 트랜드를 분석하는 방법 월차매출 : 매출 합계를 월별로 집계 매출누계 : 해당 월의 매출에 이전월까지의 매출 누계를 합한 값 이동누계 : 해당 월의 매출에 과거 11개월의 매출을 합한 값 with 절 이용해서 구할 수 있다

with 
daily_purchase as ( 
    select dt, 
           substring(dt, 1, 4) as year, 
           substring(dt, 6, 2) as month,
           substring(dt, 9, 2) as date,
           sum(purchase_amount) as purchase_amount,
           count (order_id) as orders 
    from `purchase_log`
    group by dt )
, monthly_amount as ( --월별 매출 집계하기, 이전 with 테이블로 만든 daily_purchase 를 가져올 수 있다 
       select  year, 
               month, 
               sum(purchase_amount) as amount
        from daily_purchase
        group by year, month)
 , calc_index as ( 
        select year, month, amount,
               sum(case when year='2015' then amount end)
                  over(order by year, month rows unbounded preceding) as agg_amount,
               sum(amount) over (order by year,month rows between 11 preceding and current row) as year_avg_amount
        from monthly_purchase
        order by year,month)
        
select concat(year,'-',month) as year_month,
       amount,
       agg_amount,
       year_avg_amount
       
from calc_index
where year='2015'
order by year_month ;

10강. 다면적인 축을 사용해 데이터 집약하기

Dril down : 가장 요약된 레벨부터 가장 상세한 레벨까지 차원의 계층에 따라 분석에 필요한 요약 수준을 바꿀 수 있는 기능

매출 구성비계와 등급을 계산하는 쿼리

WITH monthly_sales AS (
    --항목별 매출 계산
    SELECT category1, sum(amount) as amount  
    FROM `purchase_log`
    WHERE dt BETWEEN '2015-12-01' AND '2015-12-31'
    GROUP BY category1 )
,sales_composition_ratio AS (
    SELECT category1, amount,
        --구성비 : 100.0*<항목별 매출> / <전체 매출>
        100.0* amount/SUM(amount) OVER() AS compostion_ratio, 
        -- 구성비누계 : 100.0* <항목별 ㅜ계 매출> / <전체 매출>
        100.0* SUM(amount) OVER (ORDER BY amount DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        / SUM(amount) OVER() AS cumulative_ratio 
    FROM `monthly_sales` )
        
select * 
    -- 구성비누계 범위에 따라 순위 붙이기 
    , CASE WHEN cumulative_ratio BETWEEN 0 AND 70 THEN 'A'
           WHEN cumulative_ratio BETWEEN 70 AND 90 THEN 'B'
           WHEN cumulative_ratio BETWEEN 90 AND 100 THEN 'C'
       END AS abc_rank 
       
 FROM `sales_composition_ratio`
 ORDER BY amount DESC ;

팬 차트 : 어떤 기준을 100%로 두고, 이후의 숫자 변동을 확인할 수 있게 해주는 그래프. 작은 변화도 알아차리기 쉽다.

히스토그램 : 가로축에 단계 (데이터의 범위) , 세로축에 도수 (데이터의 개수) 데이터가 어떻게 분산되어 있는지 확인할 수 있다 SQL에서는 히스토그램을 작성하는 함수가 표준 제공됨 !!! 계급 상한을 <금액의 최댓값>+1 해서 모든 레코드가 계급 상한 미만이 되게 만들어준다

WITH stats AS( 
    SELECT MAX(price) AS max_price,
           MIN(price) AS min_price,
           MAX(price)-MIN(price) AS range_price,
           10 AS bucket_num FROM purchase_detail_log)
, purchase_log_with_bucket AS(
SELECT price, min_price, 
price-min_price AS diff,
1.0 * range_price / bucket_num AS bucket_range 
--계층 판정 : FLOOR (<정규화 금액>/<계층 범위>)
,FLOOR(1.0 (price-min_price) / (1.0 * range_price / bucket_num)
--index가 1부터 시작하므로 1만큼 더하기 
)+1 AS bucket 

FROM purchase_detail_log,stats 
) 
SELECT * 
FROM purchase_log_with_bucket
ORDER BY amount ;

Previous데이터 분석을 위한 SQL 레시피 NextCh5. 사용자를 파악하기 위한 데이터 추출

Last updated 5 years ago

Was this helpful?