DuckDB analysis
Superstore assumptions check
Dataset: superstore.duckdb with 8,399 rows. Each section below tests one assumption with SQL run directly on DuckDB.
1. Technology is the strongest category
Test: compare category sales, profit, and share of total output.
Verdict: correct
Technology leads both revenue and profit. Furniture has substantial sales but a much weaker profit contribution.
Category summary
SELECT
"Product Category" AS category,
COUNT(*) AS rows,
ROUND(SUM(Sales),2) AS sales,
ROUND(SUM(Profit),2) AS profit,
ROUND(100.0 * SUM(Profit) / SUM(Sales),2) AS margin_pct
FROM superstore
GROUP BY 1
ORDER BY sales DESC
| category |
rows |
sales |
profit |
margin_pct |
| Technology |
2065 |
5984248.18 |
886313.52 |
14.81 |
| Furniture |
1724 |
5178590.54 |
117433.03 |
2.27 |
| Office Supplies |
4610 |
3752762.10 |
518021.43 |
13.80 |
Category share of total
WITH totals AS (
SELECT SUM(Sales) AS total_sales, SUM(Profit) AS total_profit FROM superstore
)
SELECT
s."Product Category" AS category,
ROUND(100.0 * SUM(s.Sales) / t.total_sales, 2) AS sales_share_pct,
ROUND(100.0 * SUM(s.Profit) / t.total_profit, 2) AS profit_share_pct
FROM superstore s, totals t
GROUP BY 1, t.total_sales, t.total_profit
ORDER BY profit_share_pct DESC
| category |
sales_share_pct |
profit_share_pct |
| Technology |
40.12 |
58.24 |
| Office Supplies |
25.16 |
34.04 |
| Furniture |
34.72 |
7.72 |
2. Discounts hurt profitability
Test: compare discounted vs non-discounted orders, discount buckets, and correlation.
Verdict: correct
Discounted orders have lower average profit and margin. The correlation is negative, although weak, which means the effect exists but is not the only driver.
Discounted vs no discount
SELECT
CASE WHEN Discount = 0 THEN 'No discount' ELSE 'Discounted' END AS discount_state,
COUNT(*) AS rows,
ROUND(AVG(Discount),3) AS avg_discount,
ROUND(AVG(Profit),2) AS avg_profit,
ROUND(AVG(Sales),2) AS avg_sales,
ROUND(100.0 * SUM(Profit) / SUM(Sales),2) AS margin_pct
FROM superstore
GROUP BY 1
ORDER BY 1
| discount_state |
rows |
avg_discount |
avg_profit |
avg_sales |
margin_pct |
| Discounted |
7643 |
0.055 |
174.48 |
1756.10 |
9.94 |
| No discount |
756 |
0.000 |
248.93 |
1975.86 |
12.60 |
Profit by discount bucket
WITH x AS (
SELECT
CASE
WHEN Discount = 0 THEN '0%'
WHEN Discount <= 0.1 THEN '0-10%'
WHEN Discount <= 0.2 THEN '10-20%'
WHEN Discount <= 0.3 THEN '20-30%'
ELSE '30%+'
END AS discount_bucket,
Profit,
Sales,
Discount
FROM superstore
)
SELECT
discount_bucket,
COUNT(*) AS rows,
ROUND(AVG(Discount),3) AS avg_discount,
ROUND(SUM(Sales),2) AS sales,
ROUND(SUM(Profit),2) AS profit,
ROUND(100.0 * SUM(Profit) / SUM(Sales),2) AS margin_pct
FROM x
GROUP BY 1
ORDER BY CASE discount_bucket WHEN '0%' THEN 1 WHEN '0-10%' THEN 2 WHEN '10-20%' THEN 3 WHEN '20-30%' THEN 4 ELSE 5 END
| discount_bucket |
rows |
avg_discount |
sales |
profit |
margin_pct |
| 0% |
756 |
0.000 |
1493748.22 |
188188.77 |
12.60 |
| 0-10% |
7638 |
0.055 |
13419584.33 |
1334063.20 |
9.94 |
| 10-20% |
3 |
0.147 |
1265.97 |
14.80 |
1.17 |
| 20-30% |
2 |
0.230 |
1002.30 |
-498.79 |
-49.76 |
Discount / profit correlation
SELECT
ROUND(corr(Discount, Profit), 6) AS discount_profit_corr,
ROUND(corr(Discount, Profit / NULLIF(Sales,0)), 6) AS discount_margin_corr
FROM superstore
| discount_profit_corr |
discount_margin_corr |
| -0.037128 |
-0.057844 |
3. Furniture is the weakest category financially
Test: compare loss rate and profit across categories, and isolate Furniture directly.
Verdict: correct
Furniture has the lowest margin and the smallest profit share by a wide margin. Its loss rate is also the highest, just ahead of Office Supplies.
Loss rate by category
SELECT
"Product Category" AS category,
COUNT(*) AS rows,
SUM(CASE WHEN Profit < 0 THEN 1 ELSE 0 END) AS loss_rows,
ROUND(100.0 * SUM(CASE WHEN Profit < 0 THEN 1 ELSE 0 END) / COUNT(*),2) AS loss_rate_pct,
ROUND(SUM(Profit),2) AS total_profit
FROM superstore
GROUP BY 1
ORDER BY loss_rate_pct DESC
| category |
rows |
loss_rows |
loss_rate_pct |
total_profit |
| Furniture |
1724 |
922.0 |
53.48 |
117433.03 |
| Office Supplies |
4610 |
2461.0 |
53.38 |
518021.43 |
| Technology |
2065 |
881.0 |
42.66 |
886313.52 |
Furniture only
SELECT
COUNT(*) AS rows,
SUM(CASE WHEN Profit < 0 THEN 1 ELSE 0 END) AS loss_rows,
ROUND(100.0 * SUM(CASE WHEN Profit < 0 THEN 1 ELSE 0 END) / COUNT(*),2) AS loss_rate_pct,
ROUND(SUM(Profit),2) AS total_profit
FROM superstore
WHERE "Product Category" = 'Furniture'
| rows |
loss_rows |
loss_rate_pct |
total_profit |
| 1724 |
922.0 |
53.48 |
117433.03 |
Shipping sanity check
Not one of the three assumptions, but useful context for the delivery fields.
Ship mode summary
WITH x AS (
SELECT
"Ship Mode" AS ship_mode,
date_diff('day', strptime("Order Date", '%m/%d/%Y'), strptime("Ship Date", '%m/%d/%Y')) AS ship_days,
"Shipping Cost" AS shipping_cost,
Profit,
Sales
FROM superstore
)
SELECT
ship_mode,
COUNT(*) AS rows,
ROUND(AVG(ship_days),2) AS avg_ship_days,
ROUND(AVG(shipping_cost),2) AS avg_shipping_cost,
ROUND(AVG(Profit),2) AS avg_profit,
ROUND(AVG(Sales),2) AS avg_sales
FROM x
GROUP BY 1
ORDER BY avg_ship_days
| ship_mode |
rows |
avg_ship_days |
avg_shipping_cost |
avg_profit |
avg_sales |
| Express Air |
983 |
2.00 |
7.99 |
149.98 |
1204.90 |
| Regular Air |
6270 |
2.04 |
7.66 |
176.19 |
1197.18 |
| Delivery Truck |
1146 |
2.04 |
45.35 |
235.29 |
5431.83 |
Ship-day distribution
SELECT
"Ship Mode" AS ship_mode,
ROUND(AVG(date_diff('day', strptime("Order Date", '%m/%d/%Y'), strptime("Ship Date", '%m/%d/%Y'))),2) AS avg_days,
MIN(date_diff('day', strptime("Order Date", '%m/%d/%Y'), strptime("Ship Date", '%m/%d/%Y'))) AS min_days,
MAX(date_diff('day', strptime("Order Date", '%m/%d/%Y'), strptime("Ship Date", '%m/%d/%Y'))) AS max_days
FROM superstore
GROUP BY 1
ORDER BY avg_days
| ship_mode |
avg_days |
min_days |
max_days |
| Express Air |
2.00 |
0 |
19 |
| Regular Air |
2.04 |
0 |
92 |
| Delivery Truck |
2.04 |
0 |
17 |