Episode 32 — 3.2 Select Statistical Approach: Descriptive, Predictive, Prescriptive, Inferential

In Episode Thirty-Two, titled “Three Point Two Select Statistical Approach: Descriptive, Predictive, Prescriptive, Inferential,” the objective is to match the kind of statistics used to the kind of question being asked, because the wrong approach can produce a polished answer to the wrong problem. Many real-world analysis failures happen when a team treats a descriptive question like a predictive one, or treats a predictive estimate like a prescriptive recommendation, and the audience then acts as if certainty exists where it does not. This episode builds a mental map that connects question types to method families, so the analyst can choose an approach deliberately instead of by habit. The exam angle is that CompTIA Data Plus questions often test whether the candidate recognizes what the stakeholder is truly asking, not whether they can name a specific formula. The practical angle is that selecting the right approach protects trust because it keeps conclusions aligned with evidence and with the limits of the data.

Descriptive statistics are used to summarize what already happened, and they are the backbone of most reporting because they turn raw records into comprehensible patterns. Descriptive work includes counts, totals, averages, medians, percentiles, and simple breakdowns by group, like region, channel, or device, which help the audience see what is typical and what is changing. The value of descriptive statistics is that they stay close to observed data, so the claims are usually straightforward to validate and explain. Even simple descriptive measures can be misleading if data quality issues like missingness or duplication exist, which is why descriptive work often pairs naturally with quality checks and careful definitions. When the question is “What happened last week,” descriptive statistics are often the correct tool because they provide a faithful summary rather than speculation.

Inferential statistics are used to generalize from samples to populations, and they become important when measuring everyone is impractical or when the data represents only a subset of the group you care about. Inference uses sample information to estimate population properties, which can include confidence intervals, hypothesis tests, and other methods that quantify uncertainty about the population value. The main discipline is that inferential claims depend on how the sample was collected, because biased sampling can produce confident-looking but incorrect generalizations. Inference is also sensitive to assumptions, such as whether observations are independent or whether the sampling process creates clusters that reduce effective sample size. When the question is “What can we say about all customers based on a survey,” inferential methods are the right fit, but only when sampling and missingness are understood and explained.

Predictive methods estimate likely future outcomes using available signals, and they are appropriate when the goal is forecasting, risk scoring, or anticipating behavior. Prediction can be simple, like using trend extrapolation, or more advanced, like using supervised learning models, but the common thread is that the output is a probabilistic estimate about what is likely to happen next. Predictive work requires careful attention to leakage, meaning avoiding features that include future information that would not be available at prediction time, because leakage makes accuracy look better than it is. It also requires validation on data not used to train or tune the model, because performance measured only on seen data is typically optimistic. When the question is “Which accounts are most likely to churn next month,” prediction is the appropriate approach, but it must be framed as likelihood with uncertainty rather than as certainty.

Prescriptive methods recommend actions under constraints, and they are used when the question moves from “what is likely” to “what should we do.” Prescriptive work often involves optimization, scenario evaluation, and decision analysis, where multiple actions have different costs, benefits, risks, and constraints like budget, staffing, or policy limits. The key difference is that prescription requires an objective function, meaning a clear definition of what “best” means, and it requires constraints that define what is feasible. Prescriptive outputs can depend on predictive inputs, such as forecasting demand before allocating resources, but the prescriptive layer adds the decision rule that turns predictions into recommendations. When the question is “How should we allocate recruiter time to improve hiring outcomes within a fixed budget,” prescriptive methods are the right category because the output is a recommended allocation, not just a description or forecast.

Method choice should be based on data quality and available signals because weak data can make advanced methods produce confident nonsense. If missingness is clustered, labels are inconsistent, or key fields are unreliable, predictive performance can degrade and inferential conclusions can be biased. Descriptive statistics can still be useful in weak data environments if the limitations are stated clearly, because summaries can reveal issues and support operational fixes. Predictive methods also require stable relationships, meaning signals that were informative in the past remain informative enough to be useful in the future, which is not always true in changing systems. Prescriptive recommendations depend on trustworthy cost and outcome estimates, so weak measurement can lead to poor “optimal” decisions. A disciplined analyst chooses the simplest method that answers the question given the quality of evidence available.

Assumptions are what connect math to reality, so checking assumptions like independence and stable distributions is part of selecting an approach, not a separate academic step. Independence matters because many methods assume observations do not influence each other, but real data often includes repeated measures from the same person or correlated behavior within teams, regions, or time periods. Stable distributions matter because methods often assume the process generating the data has not changed dramatically, but product releases, policy changes, and external shocks can shift behavior and break prior relationships. When assumptions are violated, inferential confidence intervals can be too narrow, predictive models can fail unexpectedly, and prescriptive recommendations can optimize for a world that no longer exists. Assumption checking is not about perfect proof, it is about spotting obvious mismatches between the method’s expectations and the data’s structure.

A hiring pipeline example helps compare approaches because it naturally includes descriptive reporting, sampling questions, prediction opportunities, and decision constraints. Descriptively, the pipeline can be summarized by counts at each stage, time-to-move between stages, and conversion rates from application to offer to acceptance, segmented by role type or region. Inferentially, a survey of candidate experience could be used to estimate satisfaction for the broader candidate population, with uncertainty bounds that reflect sample size and sampling method. Predictively, historical signals like time-to-first-response, interview count, and role requirements might be used to estimate acceptance likelihood, recognizing that prediction must avoid leakage such as using post-offer negotiation notes. Prescriptively, the organization might decide how to allocate interviewer time or recruiter outreach across roles to maximize accepted offers under time constraints. Seeing all four approaches applied to the same context makes the categories feel like tools chosen by question type rather than labels memorized in isolation.

Avoiding false certainty is critical across all approaches, and the safest practice is to state uncertainty and confidence in plain language that matches what the method can truly support. Descriptive results can still include uncertainty when data completeness is known to vary, such as when late-arriving events may revise a number. Inferential results should express uncertainty explicitly through confidence intervals or clear statements of sampling limitations, because the whole purpose of inference is to describe what is not known precisely. Predictive results should be framed as probabilities or expected ranges, not as guaranteed outcomes, because prediction errors are inevitable even in strong models. Prescriptive recommendations should include sensitivity to assumptions, such as how the recommended action changes if demand is higher than expected or if costs shift. Clear uncertainty language protects trust because it prevents stakeholders from overcommitting based on an answer that cannot honestly promise certainty.

Separating correlation from causation is another essential control because the wrong causal story can drive harmful actions, especially when stakeholders interpret statistical relationships as proof of mechanism. A correlation can occur because two variables move together due to a shared driver, because one influences the other, or because data collection creates a spurious link, and descriptive and predictive methods often find correlations without identifying causes. Inferential methods can test differences between groups, but without experimental design or strong quasi-experimental reasoning, those differences do not automatically imply causation. Prescriptive methods can amplify this risk, because an optimization can recommend actions based on correlated signals that do not actually cause improvement. The disciplined communication move is to describe what is observed and what is supported, then avoid causal claims unless the evidence truly justifies them.

Method complexity should match stakeholder needs and time limits because a perfect model delivered too late or explained too poorly can be less useful than a simpler approach delivered on time with clear meaning. Some stakeholders need a clear summary and a reliable trend for a meeting in an hour, and a complicated inferential or predictive approach may not add value in that setting. Other stakeholders, such as compliance or executive leadership, may need a defensible estimate with explicit uncertainty, and that can justify additional inferential work if time allows. Complexity also affects explainability, because a model that cannot be explained can trigger skepticism and reduce adoption even if it performs well. The practical goal is fit, meaning the method should produce an answer that is timely, understandable, and as accurate as the data and question allow.

Validation should be chosen to match the approach, and holdout checks and reasonableness tests are two general-purpose tools that support trust across categories. For predictive work, holdout checks measure performance on unseen data, which reveals whether the model generalizes beyond the examples it learned from. For descriptive and inferential work, reasonableness tests compare results to expectations, known totals, and simple sanity checks, like whether rates fall within plausible ranges and whether segmented totals reconcile to overall totals. Prescriptive work benefits from scenario testing, where recommended actions are evaluated under different plausible conditions to see whether the recommendation is robust or fragile. Validation is less about proving perfection and more about reducing the risk of confident error, which is what stakeholders fear most when relying on statistical outputs.

A decision tree for selecting an approach can be narrated as a sequence of yes-or-no questions that begins with intent and ends with a method family. The first question is whether the audience is asking what happened, and if so, descriptive statistics are usually the starting point, paired with quality checks that ensure the summary reflects reality. If the audience is asking about a population but only a sample is available, inferential methods become relevant, provided sampling and assumptions are understood well enough to support generalization. If the audience is asking what is likely to happen next, predictive methods fit, but only when signals are available and leakage is controlled, and when performance can be validated on unseen data. If the audience is asking what action to take under constraints, prescriptive methods apply, and they require explicit objectives, constraints, and sensitivity checks. This narrated tree keeps method selection aligned to question type, which is the core exam skill.

The conclusion of Episode Thirty-Two assigns one practical exercise: take one real question and classify it by approach before doing any calculations, because classification prevents wasted effort and misframed conclusions. The question could be as simple as “What was the conversion rate last month,” which is descriptive, or “What is the likely acceptance rate next quarter,” which is predictive, or “What can we infer about all applicants from a survey sample,” which is inferential, or “How should we allocate recruiter effort given fixed capacity,” which is prescriptive. The value of the assignment is that it forces the analyst to name what kind of claim is being made and what kind of evidence is required, which then guides validation and communication. When this habit becomes routine, the choice of statistical approach becomes a control that protects accuracy and trust, rather than a last-minute label added after the fact.

Episode 32 — 3.2 Select Statistical Approach: Descriptive, Predictive, Prescriptive, Inferential
Broadcast by