A step-by-step investigation of customer experience using KPI decomposition, diagnostic analytics, and root cause thinking♦Image Generated by Gemini AIWhen I first started learning data analysis using Python, I thought the process was pretty simple, just download a dataset, open a Python IDE or notebook, run the df.head(), create a few charts, identify some insights, and summarize the findings. That workflow appeared in nearly every practice project I worked on, so I naturally assumed that’s what data analysts did.
As I’ve completed more projects, and most importantly, since I started working as a data analyst. I’ve found that companies aren’t actually asking for dashboards. Instead, they ask questions like:
"Why are customers complaining more often?"
"Why are delivery issues increasing?"
"Why are customer ratings getting worse?"
"What exactly is going wrong?"
To answer these questions, simply creating charts isn’t enough. It requires a deeper analysis of the data, identifying patterns, testing assumptions, and connecting those findings to real-world business problems.
That’s the mindset I want to adopt this time… hehehe..
Starting with the Problem, Not the DataRight now, I’m working with the Customer Analytics dataset from Kaggle. Like always, my first instinct was to open the dataset and start exploring it. This time, instead of writing a line of SQL queries or Python code, I decided to try something different.
I asked myself, “If I were actually working as a data analyst at this company, what kind of problems would management ask me to solve?”
Not, Show me the distribution of customer ratings, Create five visualizations, or Build a dashboard.
More realistic would be…, “Customers seem dissatisfied with their shopping experience. Could you look into what’s going on?”
Dashboards Describe. Investigations Explain.Actually.. there’s nothing wrong with dashboards. They’re incredibly useful. Dashboards often just tell us what’s happening, but they don’t always explain why. And that’s exactly what I want to focus on in this project. Instead of compiling a collection of unrelated charts, I want each analysis to help answer a specific business question.
So… Where Does My Investigation Begin?Yeahh here.. I was facing a business challenge. Management wasn’t asking me for a new dashboard. Nor were they asking for five Power BI visualizations or reports. Instead, they made a much more specific question:
“We’ve noticed that a surprisingly large proportion of orders are not being delivered on time. Could this operational issue be affecting our customers’ overall experience?”
Rather than making assumptions, these concerns can actually be verified with data. Before diving into explanations or visualizations, I want to verify the KPI itself.
This quick calculation gives us the percentage of late deliveries.
late_rate = (
df["Reached.on.Time_Y.N"] == 1
).mean()
print(f"Late Delivery Rate: {late_rate:.2%}")
♦I am now facing a real business problem, not just a feeling or an assumption, but measurable KPIs. This immediately shifts the perspective of my investigation. Instead of saying, “Customers seem dissatisfied with their shopping experience.”
Now we can say:
“Early assessment of the KPIs indicates that approximately 59.67% of orders were not delivered on time, causing concern about the potential continued impact on customer satisfaction and the overall customer experience.”
And that raises the next question, “Is this operational issue actually affecting customer experience?”
Looking through the dataset, there isn’t a single column titled “Customer Experience Score.” Instead, customer experience must be inferred based on multiple indicators, such as Customer Rating, Customer Care Calls, Reached on Time, or Discount Offered.
These variables became the main focus of my investigation. But measuring these factors wasn’t enough to explain the issue. The next step was to understand whyy this might be happening.
From KPI to HypothesisAfter calculating KPIs, I’m often tempted to jump straight into creating charts and calling the results “insights”. But this time, I want every analysis to start with a hypothesis.
Rather than trying to prove a single explanation, I want to explore various possibilities and let the data guide the investigation. Here’s what the research I have in mind looks like:
♦Image Generated by Gemini AIHypothesis 1. Delivery Performance May Affect Customer ExperienceMy first hypothesis is probably the most intuitive one, hehehe. If customers receive their orders late, there is a reasonable expectation that they may have a poorer shopping experience, which could eventually be reflected in lower their ratings. But rather than assuming that’s the case, I want to let the data answer that question. So, the first diagnostic question is:
Do customers experiencing delayed deliveries tend to report lower customer ratings?
rating_delivery = (
df.groupby("Reached.on.Time_Y.N")
.agg(
avg_rating=("Customer_rating", "mean"),
total_orders=("Customer_rating", "count")
)
)
rating_delivery
♦I’m not trying to prove causality here. I just want to know if late deliveries are associated with worse customer ratings.
Surprisingly, the results didn’t support my initial hypothesis at all! Customers with delayed deliveries reported an average rating of 3.01, while on-time customers reported 2.97. This “teeny-weeny” difference suggests no meaningful relationship between delivery timeliness and customer ratings. It completely challenges one of the most obvious business assumptions. Even tho roughly 59.67% of orders were delivered late, customers just don’t rate their experience differently based on delivery status alone. At this stage, delivery performance is definitely not the main driver of customer ratings.
Of course, that doesn’t mean my investigation is over. It actually makes me wonder, if delays are so common, “why are they even happening in the first place?”
H1.1 If Delivery Performance Matters… Why Are Deliveries Delayed?The previous analysis suggests that delayed deliveries may not directly explain customer ratings. But, the late delivery rate remains unusually high. So the next diagnostic question is, “Which shipment mode experiences the highest proportion of delayed deliveries?”
shipment_delay.rename(
columns={
0: "On Time",
1: "Late"
},
inplace=True
)
shipment_delay
♦Using proportions instead of raw counts makes it much easier to compare performance. Interestingly, the delay rates are almost identical across the board with Flight (60.16%), Ship (59.76%), and Road (58.81%). Since Flight is only slightly higher, the variation is way too small to blame a specific transportation mode. But honestly, ruling this out is still a huge clue for me. If one method had been a total disaster, it would be an obvious target to fix. Instead, this tells me the issue is much broader than the shipment mode itself, helping me eliminate one big suspect and shifting my focus to new questions.
H1.2 Is Shipment Mode Really the Problem?Or is shipment mode simply reflecting operational inefficiencies occurring elsewhere in the fulfillment process?
One possible explanation is warehouse performance. To investigate this possibility, the next question is, “Which warehouse contributes most to delayed deliveries?”
warehouse_delay = pd.crosstab(
df["Warehouse_block"],
df["Reached.on.Time_Y.N"],
normalize="index"
)
warehouse_delay
♦Once again, the same pattern shows up. Warehouse delay rates are almost identical, and even though Warehouse B has the highest late rate, the variation is too tiny to mean anything. So, I definitely can’t blame a specific warehouse for these late deliveries.
So, instead of pointing fingers at one bad warehouse, it clearly shows me that delays are distributed evenly throughout our whole fulfillment network.
At this point, I’ve officially scratched off two major suspects:
- Shipment mode isn’t driving the delays.
- Warehouse location isn’t the culprit either.
Even tho I haven’t found the root cause yet, eliminating these obvious operational guesses has successfully narrowed down my search space!
Hypothesis 2. Customer Support May Reflects Service QualityHmm yaa.. Shipping performance isn’t the only possible explanation. Another possibility is that customers who repeatedly contact customer service are experiencing unresolved issues during their shopping experience. If that assumption is true, customers with more customer care interactions should report lower ratings. This raises another diagnostic question, “Do customers making more customer care calls tend to report lower customer ratings?”
care_rating = (
df.groupby("Customer_care_calls")
.agg(
avg_rating=("Customer_rating", "mean"),
customers=("Customer_rating", "count")
)
)
care_rating
♦The results show that customer ratings remain remarkably flat, no matter how many times a customer calls support. Instead of a consistent decline, the scores just bounce around a tiny, narrow range between 2.96 and 3.08.
This tells me that the number of customer care calls alone isn’t a red flag for customer dissatisfaction here. In other words, the spam-callers aren’t necessarily giving us terrible ratings. Of course, that doesn’t mean customer support is completely irrelevant. It just means the relationship isn’t as obvious as my initial thought.
Hypothesis 3. Pricing Strategy May Be Masking Deeper ProblemsFinally, I also wanted to challenge another common business assumption.
Promotions are often used to stimulate purchases and improve customer engagement. However, discounts do not automatically improve customer experience. So.. another diagnostic question is, “Do customers receiving larger discounts still report poor customer ratings?”
To make the analysis easier to interpret, discounts were grouped into three categories.
df["discount_group"] = pd.cut(
df["Discount_offered"],
bins=[0,10,30,100],
labels=[
"Low",
"Medium",
"High"
]
)
discount_rating = (
df.groupby("discount_group")
.agg(
avg_rating=("Customer_rating","mean"),
customers=("Customer_rating","count")
)
)
discount_rating
♦The results show basically zero variation between the discount groups. Customer ratings remain almost identical no matter how big the discount is.
Surprisingly, the customer group getting massive discounts doesn’t look any happier than the others. This tells me that a discount strategy alone isn’t going to fix or explain the customer experience here.
Hypothesis 4: Customer Loyalty May Influence Customer ExperienceAfter running all those operational hypotheses, I noticed something kinda frustrating but interesting. Delivery performance? Flat. Customer support calls? No pattern. Discounts? Identical ratings.
At that point, I started wondering… Hmm, what if this isn’t an operational issue at all? What if customer experience is driven by something purely behavioral? Like, customer loyalty?
My logic here is pretty simple here, customers who buy from us repeatedly are probably more used to our shopping process, shipping timelines, and overall quirks. So, my next question is, “Do customers with more prior purchases actually give higher ratings?”
df["loyalty_group"] = pd.cut(
df["Prior_purchases"],
bins=[0, 3, 6, 10],
labels=[
"New",
"Returning",
"Loyal"
]
)
loyalty_rating = (
df.groupby("loyalty_group")
.agg(
avg_rating=("Customer_rating", "mean"),
customers=("Customer_rating", "count")
)
)
loyalty_rating
♦At first glance, the numbers aren’t exactly mind-blowing. Me is definitely not expecting a scenario where loyal customers rate for a whole point higher than new ones.
But, hmm wait… a subtle pattern does crawl up! The average ratings gradually increase the more prior purchases a customer has. While the gaps are small, returning and loyal customers consistently report slightly better scores than first-time buyers.
Now, does this prove that loyalty causes satisfaction? Yeahh… Not really. This is possibly customer keep buying because they already liked us in the past, or maybe familiarity just sets more realistic expectations, so they don’t get disappointed as easily. Either way, this was one of the few hypotheses that actually showed a consistent directional trend across my whole analysis.
Reality Check: Connecting the Dots (and Embracing the Flat Lines)After testing all those hypotheses, I sat back and looked at the bigger picture. To be honest, the results were a bit of a reality check. Delivery performance, customer support calls, and discount strategies all turned out to be completely flat lines that barely made a diff on customer ratings. The only real runner in this analysis was customer loyalty, which showed a small but positive trend.
In a textbook or a classroom project, I (maybe youu also hehehe) expect to find that one perfect, glaring root cause. But in the real world of data, sometimes the most valuable insight is realizing that the answer simply isn’t in the data you are currently tracking.
These flat relationships are telling me that we might be measuring our logistics and warehouse operations perfectly, but we are completely missing how customers actually feel. Customer experience isn’t just a byproduct of isolated operational metrics. Instead, it’s heavily shaped by behavioral and psychological context that lives outside this dataset, like product quality consistency, brand trust, or the frustrating gap between what marketing promised versus what reality delivered. At the end of the day, you can’t fix customer perception purely with faster trucks or bigger discounts.
Business Recommendations- Time to rethink how we measure customer experience. Current operational KPIs like delivery timeliness or support calls don’t capture the whole story. I highly recommend tracking the gap between expectation versus actual delivery perception, along with post-purchase satisfaction scores instead.
- Stop obsessing over logistics alone. Improving logistics or warehouse efficiency in isolation won’t automatically fix customer ratings. Operational tweaks should be treated as a bare minimum, not a magic fix for satisfaction.
- Investigate the missing drivers. For future analysis, we need to explore unobserved variables outside this dataset, such as product quality consistency, text-based review sentiment, and marketing-driven expectations.
- Strengthen customer lifecycle analysis. Since customer loyalty was the only metric showing a mild positive trend, it would be super useful to shift toward comparing first-time vs returning customer experiences and tracking satisfaction across repeat purchases.
Donee — (!!!)To explore my other projects, feel free to check out my Medium or GitHub. Stay tuned for more challenging projects~~
jihanKamilah - Overview
♦Beyond Dashboards: Testing Business Hypotheses Through KPI Trees & Diagnostic Analytics was originally published in Code Like A Girl on Medium, where people are continuing the conversation by highlighting and responding to this story.