DuckDB analysis

Superstore assumptions check

Dataset: superstore.duckdb with 8,399 rows. Each section below tests one assumption with SQL run directly on DuckDB.

1. Technology is the strongest category

Test: compare category sales, profit, and share of total output.

Verdict: correct

Technology leads both revenue and profit. Furniture has substantial sales but a much weaker profit contribution.

Category summary
SELECT
      "Product Category" AS category,
      COUNT(*) AS rows,
      ROUND(SUM(Sales),2) AS sales,
      ROUND(SUM(Profit),2) AS profit,
      ROUND(100.0 * SUM(Profit) / SUM(Sales),2) AS margin_pct
    FROM superstore
    GROUP BY 1
    ORDER BY sales DESC
category rows sales profit margin_pct
Technology 2065 5984248.18 886313.52 14.81
Furniture 1724 5178590.54 117433.03 2.27
Office Supplies 4610 3752762.10 518021.43 13.80
Category share of total
WITH totals AS (
      SELECT SUM(Sales) AS total_sales, SUM(Profit) AS total_profit FROM superstore
    )
    SELECT
      s."Product Category" AS category,
      ROUND(100.0 * SUM(s.Sales) / t.total_sales, 2) AS sales_share_pct,
      ROUND(100.0 * SUM(s.Profit) / t.total_profit, 2) AS profit_share_pct
    FROM superstore s, totals t
    GROUP BY 1, t.total_sales, t.total_profit
    ORDER BY profit_share_pct DESC
category sales_share_pct profit_share_pct
Technology 40.12 58.24
Office Supplies 25.16 34.04
Furniture 34.72 7.72

2. Discounts hurt profitability

Test: compare discounted vs non-discounted orders, discount buckets, and correlation.

Verdict: correct

Discounted orders have lower average profit and margin. The correlation is negative, although weak, which means the effect exists but is not the only driver.

Discounted vs no discount
SELECT
      CASE WHEN Discount = 0 THEN 'No discount' ELSE 'Discounted' END AS discount_state,
      COUNT(*) AS rows,
      ROUND(AVG(Discount),3) AS avg_discount,
      ROUND(AVG(Profit),2) AS avg_profit,
      ROUND(AVG(Sales),2) AS avg_sales,
      ROUND(100.0 * SUM(Profit) / SUM(Sales),2) AS margin_pct
    FROM superstore
    GROUP BY 1
    ORDER BY 1
discount_state rows avg_discount avg_profit avg_sales margin_pct
Discounted 7643 0.055 174.48 1756.10 9.94
No discount 756 0.000 248.93 1975.86 12.60
Profit by discount bucket
WITH x AS (
      SELECT
        CASE
          WHEN Discount = 0 THEN '0%'
          WHEN Discount <= 0.1 THEN '0-10%'
          WHEN Discount <= 0.2 THEN '10-20%'
          WHEN Discount <= 0.3 THEN '20-30%'
          ELSE '30%+'
        END AS discount_bucket,
        Profit,
        Sales,
        Discount
      FROM superstore
    )
    SELECT
      discount_bucket,
      COUNT(*) AS rows,
      ROUND(AVG(Discount),3) AS avg_discount,
      ROUND(SUM(Sales),2) AS sales,
      ROUND(SUM(Profit),2) AS profit,
      ROUND(100.0 * SUM(Profit) / SUM(Sales),2) AS margin_pct
    FROM x
    GROUP BY 1
    ORDER BY CASE discount_bucket WHEN '0%' THEN 1 WHEN '0-10%' THEN 2 WHEN '10-20%' THEN 3 WHEN '20-30%' THEN 4 ELSE 5 END
discount_bucket rows avg_discount sales profit margin_pct
0% 756 0.000 1493748.22 188188.77 12.60
0-10% 7638 0.055 13419584.33 1334063.20 9.94
10-20% 3 0.147 1265.97 14.80 1.17
20-30% 2 0.230 1002.30 -498.79 -49.76
Discount / profit correlation
SELECT
      ROUND(corr(Discount, Profit), 6) AS discount_profit_corr,
      ROUND(corr(Discount, Profit / NULLIF(Sales,0)), 6) AS discount_margin_corr
    FROM superstore
discount_profit_corr discount_margin_corr
-0.037128 -0.057844

3. Furniture is the weakest category financially

Test: compare loss rate and profit across categories, and isolate Furniture directly.

Verdict: correct

Furniture has the lowest margin and the smallest profit share by a wide margin. Its loss rate is also the highest, just ahead of Office Supplies.

Loss rate by category
SELECT
      "Product Category" AS category,
      COUNT(*) AS rows,
      SUM(CASE WHEN Profit < 0 THEN 1 ELSE 0 END) AS loss_rows,
      ROUND(100.0 * SUM(CASE WHEN Profit < 0 THEN 1 ELSE 0 END) / COUNT(*),2) AS loss_rate_pct,
      ROUND(SUM(Profit),2) AS total_profit
    FROM superstore
    GROUP BY 1
    ORDER BY loss_rate_pct DESC
category rows loss_rows loss_rate_pct total_profit
Furniture 1724 922.0 53.48 117433.03
Office Supplies 4610 2461.0 53.38 518021.43
Technology 2065 881.0 42.66 886313.52
Furniture only
SELECT
      COUNT(*) AS rows,
      SUM(CASE WHEN Profit < 0 THEN 1 ELSE 0 END) AS loss_rows,
      ROUND(100.0 * SUM(CASE WHEN Profit < 0 THEN 1 ELSE 0 END) / COUNT(*),2) AS loss_rate_pct,
      ROUND(SUM(Profit),2) AS total_profit
    FROM superstore
    WHERE "Product Category" = 'Furniture'
rows loss_rows loss_rate_pct total_profit
1724 922.0 53.48 117433.03

Shipping sanity check

Not one of the three assumptions, but useful context for the delivery fields.

Ship mode summary
WITH x AS (
      SELECT
        "Ship Mode" AS ship_mode,
        date_diff('day', strptime("Order Date", '%m/%d/%Y'), strptime("Ship Date", '%m/%d/%Y')) AS ship_days,
        "Shipping Cost" AS shipping_cost,
        Profit,
        Sales
      FROM superstore
    )
    SELECT
      ship_mode,
      COUNT(*) AS rows,
      ROUND(AVG(ship_days),2) AS avg_ship_days,
      ROUND(AVG(shipping_cost),2) AS avg_shipping_cost,
      ROUND(AVG(Profit),2) AS avg_profit,
      ROUND(AVG(Sales),2) AS avg_sales
    FROM x
    GROUP BY 1
    ORDER BY avg_ship_days
ship_mode rows avg_ship_days avg_shipping_cost avg_profit avg_sales
Express Air 983 2.00 7.99 149.98 1204.90
Regular Air 6270 2.04 7.66 176.19 1197.18
Delivery Truck 1146 2.04 45.35 235.29 5431.83
Ship-day distribution
SELECT
      "Ship Mode" AS ship_mode,
      ROUND(AVG(date_diff('day', strptime("Order Date", '%m/%d/%Y'), strptime("Ship Date", '%m/%d/%Y'))),2) AS avg_days,
      MIN(date_diff('day', strptime("Order Date", '%m/%d/%Y'), strptime("Ship Date", '%m/%d/%Y'))) AS min_days,
      MAX(date_diff('day', strptime("Order Date", '%m/%d/%Y'), strptime("Ship Date", '%m/%d/%Y'))) AS max_days
    FROM superstore
    GROUP BY 1
    ORDER BY avg_days
ship_mode avg_days min_days max_days
Express Air 2.00 0 19
Regular Air 2.04 0 92
Delivery Truck 2.04 0 17