English 中文(简体)
按月计算客户收入留存额的账号
原标题:Postgres code to calculate customer revenue retention by month

开展个人项目,计算每月(从欺骗到欺骗)的客户保留一年。 基本上,自头一个月以来,有多少客户被保留了((%)),在客户加入探讨用户行为的第一个月(在January cohort加入January公司的人等)。

Im a true beginner and trying to learn so any help would be appreciated! apologies if any of this is rudimentary Tech Stack + versions: PGAdmin 4, version 6.19 (desktop, windows 10 operating system) Current outcome: getting an error code specifically around the grouping. I am not sure how to fix it. Expected outcome: screenshot of desired / expected outcome (cohort percentages) screenshot here Thank you in advance for your help!

CREATE TABLE customer_revenue (
    customer_name varchar (100),
    revenue_jan DECIMAL(10,0),
    revenue_feb DECIMAL(10,0),
    revenue_mar DECIMAL(10,0),
    revenue_april DECIMAL(10,0),
    revenue_may DECIMAL(10,0),
    revenue_june DECIMAL(10,0),
    revenue_july DECIMAL(10,0),
    revenue_aug DECIMAL(10,0),
    revenue_sept DECIMAL(10,0),
    revenue_oct DECIMAL(10,0),
    revenue_nov DECIMAL(10,0),
    revenue_dec DECIMAL(10,0));
    


COPY customer_revenue (customer_name, revenue_jan, revenue_feb, revenue_mar, revenue_april, revenue_may, revenue_june, revenue_july, revenue_aug, revenue_sept, revenue_oct, revenue_nov, revenue_Dec)
-- replaced my directory name with placeholder
FROM [MY DIRECTPRY] 
WITH (FORMAT CSV, HEADER);




WITH cohort_items AS (
  SELECT
    customer_name,
    CASE
      WHEN revenue_jan > 0 THEN  January 
      WHEN revenue_feb > 0 THEN  February 
      WHEN revenue_mar > 0 THEN  March 
      WHEN revenue_april > 0 THEN  April 
      WHEN revenue_may > 0 THEN  May 
      WHEN revenue_june > 0 THEN  June 
      WHEN revenue_july > 0 THEN  July 
      WHEN revenue_aug > 0 THEN  August 
      WHEN revenue_sept > 0 THEN  September 
      WHEN revenue_oct > 0 THEN  October 
      WHEN revenue_nov > 0 THEN  November 
      WHEN revenue_dec > 0 THEN  December 
    END AS cohort_month
  FROM customer_revenue
),
cohort_size AS (
  SELECT cohort_month, COUNT(DISTINCT customer_name) AS num_customers
  FROM cohort_items
  GROUP BY 1
  ORDER BY MIN(CASE cohort_month
                  WHEN  January  THEN 1
                  WHEN  February  THEN 2
                  WHEN  March  THEN 3
                  WHEN  April  THEN 4
                  WHEN  May  THEN 5
                  WHEN  June  THEN 6
                  WHEN  July  THEN 7
                  WHEN  August  THEN 8
                  WHEN  September  THEN 9
                  WHEN  October  THEN 10
                  WHEN  November  THEN 11
                  WHEN  December  THEN 12
              END)
),
B AS (
  SELECT
    C.cohort_month,
    COUNT(DISTINCT C.customer_name) AS num_customers
  FROM cohort_items C
  WHERE EXISTS (
    SELECT 1
    FROM cohort_items
    WHERE cohort_month = C.cohort_month
      AND customer_name = C.customer_name  )
  GROUP BY C.cohort_month
)
SELECT
  B.cohort_month,
  S.num_customers AS total_customers,
  (B.num_customers::float / S.num_customers::float) * 100 AS percentage
FROM B
LEFT JOIN cohort_size S ON B.cohort_month = S.cohort_month
WHERE B.cohort_month IS NOT NULL
GROUP BY b.cohort_month
ORDER BY MIN(CASE B.cohort_month
                WHEN  January  THEN 1
                WHEN  February  THEN 2
                WHEN  March  THEN 3
                WHEN  April  THEN 4
                WHEN  May  THEN 5
                WHEN  June  THEN 6
                WHEN  July  THEN 7
                WHEN  August  THEN 8
                WHEN  September  THEN 9
                WHEN  October  THEN 10
                WHEN  November  THEN 11
                WHEN  December  THEN 12
            END);

SNIPPET OF WHAT MY DATA LOOKS LIKE (formatted this in github Markdown Format per the Directive)

Customer ID Revenue Jan Revenue Feb Revenue March
Customer 1 3300 3300 3300
Customer 2 9900 9900 0
Customer 3 0 8250 8250
问题回答

这可能有助于你重新考虑如何处理你的问题。 很可能需要将数据移至不同的结构,以协助您的询问(这被称为“unpivot”,因为你的数据目前是“pivot”。)

nb: 这是“数据”:

INSERT INTO customer_revenue (customer_name, revenue_jan, revenue_feb, revenue_mar, revenue_april, revenue_may, revenue_june, revenue_july, revenue_aug, revenue_sept, revenue_oct, revenue_nov, revenue_dec)
VALUES
  ( Customer A , 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200),
  ( Customer B , null, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300),
  ( Customer C , null, null, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400),
  ( Customer D , null, null, null, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500),
  ( Customer E , null, null, null, null, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600),
  ( Customer F , null, null, null, null, null, 1100, 1200, 1300, 1400, 1500, 1600, 1700),
  ( Customer G , null, null, null, null, null, null, 1300, 1400, 1500, 1600, 1700, 1800),
  ( Customer H , null, null, null, null, null, null, null, 1500, 1600, 1700, 1800, 1900),
  ( Customer I , null, null, null, null, null, null, null, null, 1700, 1800, 1900, 2000),
  ( Customer J , null, null, null, null, null, null, null, null, null, 1900, 2000, 2100)
  ;

根据这一抽样数据,你可以看到10个客户,每个客户从一个月开始。 这种询问应有助于确定每月留用客户的数量。

WITH revenue_by_month AS (
      /* "unpivot" the source data into a "normalized" structure */
      SELECT customer_name, 1 AS month, revenue_jan AS revenue   FROM customer_revenue
      UNION ALL
      SELECT customer_name, 2, revenue_feb FROM customer_revenue
      UNION ALL
      SELECT customer_name, 3, revenue_mar FROM customer_revenue
      UNION ALL
      SELECT customer_name, 4, revenue_april FROM customer_revenue
      UNION ALL
      SELECT customer_name, 5, revenue_may FROM customer_revenue
      UNION ALL
      SELECT customer_name, 6, revenue_june FROM customer_revenue
      UNION ALL
      SELECT customer_name, 7, revenue_july FROM customer_revenue
      UNION ALL
      SELECT customer_name, 8, revenue_aug FROM customer_revenue
      UNION ALL
      SELECT customer_name, 9, revenue_sept FROM customer_revenue
      UNION ALL
      SELECT customer_name, 10, revenue_oct FROM customer_revenue
      UNION ALL
      SELECT customer_name, 11, revenue_nov FROM customer_revenue
      UNION ALL
      SELECT customer_name, 12, revenue_dec FROM customer_revenue
      )
  , first_month_by_customer AS (
      /* calculate "first month" per customer */
      SELECT
            customer_name
          , MIN(case when revenue IS NOT NULL then month end) AS first_month
      FROM revenue_by_month
      GROUP BY customer_name
      )
SELECT
      month
    , COUNT(case when revenue IS NOT NULL then first_month_by_customer.customer_name end) AS retention
FROM revenue_by_month 
LEFT JOIN first_month_by_customer
  ON first_month_by_customer.customer_name = revenue_by_month.customer_name
GROUP BY month
ORDER BY month
;
month retention
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 10
12 10

https://dbfiddle.uk

下面的逻辑应当将头一个月的收入与随后几个月的收入相匹配,以便计算预期的百分比。

INSERT INTO customer_revenue (customer_name, revenue_jan, revenue_feb, revenue_mar, revenue_april, revenue_may, revenue_june, revenue_july, revenue_aug, revenue_sept, revenue_oct, revenue_nov, revenue_dec)
VALUES
  ( Customer A , 100, 200, 300, 400, 500, 600, 700, 800, 900,  null, null, null),
  ( Customer B , null, 300, 400, 500, 600, 700, 800, 900, 1000,  null, null, null),
  ( Customer C , null, null, 500, 600, 700, 800, 900, 1000, 1100, null, null, null),
  ( Customer D , null, null, null, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500),
  ( Customer E , null, null, null, null, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600),
  ( Customer F , null, null, null, null, null, 1100, 1200, 1300, 1400, 1500, 1600, 1700),
  ( Customer G , null, null, null, null, null, null, 1300, 1400, 1500, 1600, 1700, 1800),
  ( Customer H , null, null, null, null, null, null, null, 1500, 1600, 1700, 1800, 1900),
  ( Customer I , null, null, null, null, null, null, null, null, 1700, 1800, 1900, 2000),
  ( Customer J , null, null, null, null, null, null, null, null, null, 1900, 2000, 2100)
  ;

WITH revenue_by_month AS (
      /* "unpivot" the source data into a "normalized" structure */
      SELECT customer_name, 1 AS month, revenue_jan AS revenue   FROM customer_revenue
      UNION ALL
      SELECT customer_name, 2, revenue_feb FROM customer_revenue
      UNION ALL
      SELECT customer_name, 3, revenue_mar FROM customer_revenue
      UNION ALL
      SELECT customer_name, 4, revenue_april FROM customer_revenue
      UNION ALL
      SELECT customer_name, 5, revenue_may FROM customer_revenue
      UNION ALL
      SELECT customer_name, 6, revenue_june FROM customer_revenue
      UNION ALL
      SELECT customer_name, 7, revenue_july FROM customer_revenue
      UNION ALL
      SELECT customer_name, 8, revenue_aug FROM customer_revenue
      UNION ALL
      SELECT customer_name, 9, revenue_sept FROM customer_revenue
      UNION ALL
      SELECT customer_name, 10, revenue_oct FROM customer_revenue
      UNION ALL
      SELECT customer_name, 11, revenue_nov FROM customer_revenue
      UNION ALL
      SELECT customer_name, 12, revenue_dec FROM customer_revenue
      )
  , first_month_by_customer AS (
        /* calculate first month revenue */
        SELECT
              f.customer_name
            , f.first_month
            , sum(m.revenue) * 1.0 AS first_revenue
        FROM (
            /* calculate "first month" per customer */
            SELECT
                  customer_name
                , MIN(CASE WHEN revenue IS NOT NULL THEN month END) AS first_month
            FROM revenue_by_month
            GROUP BY customer_name
            ) AS f
        INNER JOIN revenue_by_month AS m ON f.customer_name = m.customer_name
                                        AND f.first_month = m.month
        GROUP BY
              f.customer_name
            , f.first_month
      )
SELECT
      m.month
    , SUM(m.retention) AS retained
    , SUM(m.first_revenue) AS first_revenue
    , SUM(m.revenue) AS revenue
    , SUM(m.revenue) / SUM(m.first_revenue) AS retention_pct
FROM (
    SELECT
          m.month
        , m.customer_name
        , COUNT(case when m.revenue IS NOT NULL then f.customer_name end) AS retention
        , SUM(m.revenue) AS revenue
        , SUM(case when m.month >= f.first_month then f.first_revenue end) as first_revenue
    FROM revenue_by_month AS m
    INNER JOIN first_month_by_customer AS f ON f.customer_name = m.customer_name
    GROUP BY
          m.month
        , m.customer_name
    ) AS m
GROUP BY
     m.month
ORDER BY
     month
month retained first_revenue revenue retention_pct
1 1 100.0 100 1.00000000000000000000
2 2 400.0 500 1.2500000000000000
3 3 900.0 1200 1.3333333333333333
4 4 1600.0 2200 1.3750000000000000
5 5 2500.0 3500 1.4000000000000000
6 6 3600.0 5100 1.4166666666666667
7 7 4900.0 7000 1.4285714285714286
8 8 6400.0 9200 1.4375000000000000
9 9 8100.0 11700 1.4444444444444444
10 7 10000.0 11200 1.12000000000000000000
11 7 10000.0 11900 1.19000000000000000000
12 7 10000.0 12600 1.26000000000000000000

https://dbfiddle.uk/bindoh4LEJ”rel=“nofollow noreferer”>fiddle

<><>NB 我并不是试图“拿出”最后结果,因为有数千个例子说明如何这样做(而且你没有具体说明你实际使用哪一个数据库,你今后应始终这样做)。





相关问题
SQL SubQuery getting particular column

I noticed that there were some threads with similar questions, and I did look through them but did not really get a convincing answer. Here s my question: The subquery below returns a Table with 3 ...

难以执行 REGEXP_SUBSTR

I m 查询Oracle 10g。 我有两张表格(样本数据见下文)。 i m 试图提取一些领域

SQL Query Shortcuts

What are some cool SQL shorthands that you know of? For example, something I learned today is you can specify to group by an index: SELECT col1, col2 FROM table GROUP BY 2 This will group by col2

PHP array callback functions for cleaning output

I have an array of output from a database. I am wondering what the cleanest way to filter the values is example array Array ( [0] => Array ( [title] => title 1 ...

OracleParameter and DBNull.Value

we have a table in an Oracle Database which contains a column with the type Char(3 Byte). Now we use a parameterized sql to select some rows with a DBNull.Value and it doesn t work: OracleCommand ...

Running numbers in SQL

I have a SQL-statement like this: SELECT name FROM users WHERE deleted = 0; How can i create a result set with a running number in the first row? So the result would look like this: 1 Name_1 2 ...

How to get SQL queries for each user where env is production

I’m developing an application dedicated to generate statistical reports, I would like that user after saving their stat report they save sql queries too. To do that I wrote the following module: ...

热门标签