What Is the Problem?
Many companies rush to try the “next big thing” in data without ever pausing to ask the right business questions. Or learn basic data terminology. Or learn how to look at the world through a statistical lens.
Data Heads won't have that problem. Chapters 1-3 prepares you for the road ahead and puts you in the right mindset to think about and understand data.
- Chapter 1: What Is The Problem?
- Chapter 2: What is Data?
- Chapter 3: Prepare to Think Statistically
“A problem well stated is a problem half solved.”
—Charles Kettering, inventor & engineer
The first step on your journey to become a Data Head is to help your organization work on data problems that matter.
That may sound obvious, but we suspect many of you have looked on as companies talked about how great data is but then went on to overpromise impact, misinterpret results, or invest in data technologies that didn't add business value. It often seems as if data projects are undertaken because companies like the sound of what they are implementing without fully understanding why the project itself is important.
This interaction leads to wasted time and money and can cause backlash against future data projects. Indeed, in a rush to find the hidden value in data many companies expect, they often fail at the first step in the process: defining a business problem. 1 So, in this chapter, we go back to the start.
In the next sections, we'll look at the helpful questions Data Heads should ask to make sure what you're working on matters. We'll then share an example where not asking these questions leads to a project failure. Finally, we'll discuss some of the hidden costs of not clearly defining a problem right from the start.
QUESTIONS A DATA HEAD SHOULD ASK
In our experience, going back to first principles and asking the fundamental questions required to solve a problem is easier said than done. Every company has a unique culture, and team dynamics don't always lend themselves to openly asking questions, especially ones that might make others feel undermined. And many of those becoming Data Heads find that they don't have the space to even begin asking the important questions that will drive the projects forward. Which is why having a culture in which to ask these questions is as important as the questions themselves.
There's no one-size-fits-all formula for every company and every Data Head. If you are a leader, we ask that you create an open environment that will get the questions going. (This starts with inviting the technical experts into the room.) And ask questions yourself. This exhibits humility, a key leadership trait, and encourages others to join in. If you are more junior, we encourage you to try your best to ask these questions anyway, even if you're concerned it might upset the status quo. Our advice is to simply do your best. From experience, we believe simply asking the right questions always goes a lot further than not.
We want you to be prepared in the right way, trained to spot project warning signs and raise concerns at the outset. With that, we introduce five questions you should ask before attacking a data problem:
- Why is this problem important?
- Who does this problem affect?
- What if we don't have the right data?
- When is the project over?
- What if we don't like the results?
Let's explain each in detail.
Why Is This Problem Important?
The first fundamental question is, “Why is this problem important?” It seems simple but it's one that's often overlooked. We get caught up in the hype of how we're going to solve the problem—and what we think it can do—before the project even starts. At the end of this chapter, we'll talk about the true underlying effects of not answering this question. But at a minimum, this question sets the expectations for why a project should be undertaken. This is important as data projects take time and attention—and often additional investments in technology and data. Simply identifying the importance of the problem before starting it will help optimize how company resources are best used.
You can ask the question in different ways:
- What keeps you (us) up at night?
- Why does this matter?
- Is this a new problem, or has it been solved already?
- What is the size of the prize? (What's the return on investment?)
You'll want to understand how each person sees the problem. This will help you create alignment on how everyone will end up supporting the project to solve the problem—and if they agree it should start.
During these initial discussions, you'll want to keep the focus on the central business problem and pay close attention if you hear rumblings of recent technology trends. Talk of technical trends can quickly derail the meeting away from its business focus. Be on the lookout for two warning signs:
- Methodology focus: In this trope, companies simply think trying some new analysis method or technology will set them apart. You've heard this marketing fluff before: “If you're not using Artificial Intelligence (AI), you're already behind … .” Or, companies find some other buzzword they would like to incorporate (e.g., “sentiment analysis”).
- Deliverable focus: Some projects go off track because companies focus too much on what the deliverable will be. They say the project needs to have an interactive dashboard, for example. You start the project, but the outcome becomes about the installation of the new dashboard or business intelligence system. Project teams need to take a step back and trace how what they want to build brings value to the organization.
It may come as a surprise—or a relief—that both warnings involve technology and how it should not be included when defining the problem. To be clear, at some point in the project, methodologies and deliverables enter the picture. To start, however, the problem should be in direct, clear language everyone can understand. Which is why we recommend you scrap the technical terminology and marketing rhetoric. Start with the problem to be solved, not the technology to be used.
Why does this matter? We've noticed project teams have a mix of people who are enamored by data or intimidated by it. Once the problem definition conversation steers toward analysis methods or technology, two things happen. First, anyone intimidated by data might freeze up and stop contributing to the discussion—defining the business problem. Second, those enamored by data quickly splinter the problem into technical subproblems that may or may not align to an actual business objective. Once the business problem morphs into data science subproblems, it may be weeks or months before failure is discovered. No one will want to revisit the main problem once the project work starts.
Fundamentally, teams must answer “Is this a real business problem that is worth solving, or are we doing data science for its own sake?” This is a good, albeit blunt, question to ask, especially now during the hype and confusion around data science and related fields.
Who Does This Problem Affect?
The next question you'll want to ask is, “Who does this problem affect?” The spirit of the problem is not only asking who this affects, but how that person's work will be different going forward.
You should think of all layers of the organization (and perhaps its clients, if any). We don't mean the data scientist who works on the problem or the engineering team who may have to maintain software. The Data Head needs to know who the end users are. Often, it's more than just the people in the room crafting the problem, so it's super important for you to find the people whose daily work will be affected and bring them into the meeting.
We suggest you name names. Whose work will be different if the question gets answered? If it's many people, bring in a small group to represent them. Create a list of these people and understand how they will be affected by the project. You'll want to tie these answers back to the last question.
An exercise to help you think through this is to do a solution trial run. Assume you can answer the question, and then ask your team:
- Can we use the answer?
- Whose work will change?
This, of course, assumes you even had the right data to answer the question. (As we'll see in Chapter 4, this can be a huge assumption.) But you should answer these questions and go through several scenarios where the problem has been solved. In many cases, answering these questions can strengthen the project and its impact, or may identify a project with no business benefit.
What If We Don't Have the Right Data?
Every dataset has a limited amount of information inside it, and at a certain point, no technology or analysis method will help you go any further.
In the authors' experience, not asking “What if we don't have the right data?” is where companies make some of the biggest mistakes—mistakes that could be avoided if only they were considered before the project started. Because what happens is this: everyone who has worked so far on the project now wants to take it to completion no matter what. Data Heads enter the project knowing that not having the right data is a possibility. They create contingencies to pivot to collecting better data to answer the question. Or, if the data doesn't exist, they go back to the original question and attempt to redefine the project scope.
When Is the Project Over?
Many of us have been part of projects that went on too long. When expectations aren't clear before the project starts, teams wind up attending meetings out of habit and generating reports no one bothers to read. Asking “When is the project over?” before the project starts can break this trend.
The question strikes at the heart of why the project was initiated and aligns expectations. Important problems are posed because some information or product is needed in the future that does not exist today. Find out what that final deliverable is. Doing this will rekindle conversations about the project's potential return on investment and whether the team has an agreed-upon metric to measure the project's impact.
So, gather project stakeholders and identify reasons the project could end. Some reasons are obvious, like when a project ends from a lack of funding or waning interest. Set those obvious failures aside and focus on what needs to be delivered to answer the business question and conclude the project. For data projects, the final deliverable is typically an insight (e.g., “how effective was the company's last marketing campaign?”) or an application (e.g., a predictive model that forecasts next week's shipping volume). Many projects will require additional work: perhaps ongoing support and maintenance, but this needs to be communicated to the team up front.
Don't assume you know the answer to this question until you've asked it.
What If We Don't Like the Results?
The last question a Data Head should ask prepares the stakeholders for something they'd rather overlook—the possibility their assumptions were wrong. “What if we don't like the results?” imagines you are at the point of no return. You've spent hours on a project only to find out the results show something different. Notice this is different from having data that can't answer the question. Here, the data can answer the question, perhaps quite confidently, but the answer is not what the stakeholders wanted.
It's never easy to get to the end of a project only to find out the results were not what you expected. This all too real scenario happens more often than we'd like to admit. Thinking first about the possibility that the project might reach an unwanted conclusion will ensure you have a plan in motion when you have to deliver the bad news.
Asking this question will also expose differences in how individuals will accept the results of the project. For instance, consider our avatar George from the introduction. George is the type of person who would ignore the results if they don't align to his beliefs, while simultaneously promoting favorable results that do. The question will hopefully uncover his bias early on before the project starts.
You don't want to start a project where you know there's only one accepted result.
UNDERSTANDING WHY DATA PROJECTS FAIL
Projects can fail for a host of reasons: lack of funding, tight timelines, wrong expertise, unreasonable expectations, and the like. Add data and analysis methods into the mix, and the list of possible failures not only grows but becomes obscured behind the analysis. A project team might apply an analysis method they can't explain on data they don't understand to solve a problem that doesn't matter—and still think they've succeeded.
Let's look at a scenario.
Customer Perception
You work for a Fortune 10 company, Company X, that recently received negative media attention for a socially insensitive marketing campaign. You've been assigned to a project to monitor “customer perception.”
The project team consists of the following:
- The project manager (you)
- The project sponsor (the person paying for it)
- Two marketing professionals (who don't have data backgrounds)
- A young data scientist (fresh out of college and eager to apply the techniques they learned)
At the kick-off meeting, the project sponsor and data scientist quickly and excitedly discuss something called “sentiment analysis.” The project sponsor heard about it at a recent tech conference after a competing company reported using it. The data scientist volunteered they knew sentiment analysis, having implemented it in their senior capstone project. They think it might be a good technique to apply to customer comments on the company's Twitter and Facebook pages. The marketers understand the technique as being able to interpret people's emotional reactions using social media data, but they don't say much.
The basic premise, you are told, is that sentiment analysis can automatically label a tweet or Facebook post as “positive” or “negative.” For instance, the phrase, “Thank you for sponsoring the Olympics.” is positive, whereas “Horrible customer service” is negative. Conceivably, the data scientist could count the daily totals of positives and negatives, plot the trends over time (and in real time!), and subsequently share the results via a dashboard for all to see. Most important: no one needs to read customer comments anymore. The machine will do it for everyone. So, it's decided. The project kicks off.
A month later, the data scientist proudly shows off Company X's Customer Perception Dashboard. It's updated each day to include the latest data and lists some of the week's “positive” comments along the side. Figure 1.1 zooms in on the main graphic in the dashboard: trendlines of sentiment over time. Only positive and negative values are shown, and most comments are neutral.
The project sponsor loves it. A week later, the dashboard is displayed on a monitor in the break room for all to see.
Success.
Six months later, the break room is renovated, and the monitor is removed.
No one notices.
FIGURE 1.1 Sentiment analysis trends
A postmortem of the project revealed no one in the company used the analysis, not even the marketers on the team. When asked why, the marketers admit they weren't really comfortable with the original analysis. Yes, it was possible to label each communication as positive or negative. But the idea that nobody would need to read comments anymore seemed like wishful thinking. They questioned the degree to which the labeling process had even been useful. Further, they countered that perception couldn't only be measured by online interaction even if that was the dataset most readily available to support sentiment analysis.
Discussion
In this scenario, it seemed like everything went well. But the fundamental question— why is the project important? —doesn't appear to have been brought up. Instead, the project team moved forward attempting to answer another question: “Can we build a dashboard to monitor the sentiment of customer feedback on the company's Twitter and Facebook pages?” The answer, of course, was yes, they could. But in the end the project wasn't useful or even important to the organization.
You would think marketers would have had more to say, but they were not identified as people who would have been affected by the project. In addition, this project exhibited two early warning signs in how the team attempted to solve the problem: methodology focus (sentiment analysis) and deliverable focus (dashboard).
Moreover, the project team in the Customer Perception scenario could have taken their problem, “Can we build a dashboard to monitor the sentiment of customer feedback on the company's Twitter and Facebook pages?” and performed a solution trial run. They could have assumed a dashboard was available and updated daily with positive/negative sentiments of social media comments:
- Can we use the answer? The team would be thinking about the relevance of sentiment analysis on customer perception. How can the team use the information? What is the possible business benefit of knowing the sentiment of customers on social media?
- Whose work will change? Suppose the team convinces itself that knowing sentiment is important in order to be good stewards of the business. But is someone going to monitor this dashboard? If the trends suddenly go down, do we do anything? What if they trend up?
At this point, the marketing team would have hopefully spoken up. Would they have known what to do differently in their daily work with that kind of information? Likely not. The project, in its current form, hit a wall.
If only they asked the five questions.
WORKING ON PROBLEMS THAT MATTER
So far, we've tied project failures to not defining the underlying problem correctly. Mostly, we've placed this failure in terms of losing money, time, and energy. But there's a broader issue happening all over the data space, and it's something that you wouldn't expect.
Right now, the industry is focused on training as many data workers as possible to meet the demand. That means universities, online programs, and the like are churning out critical thinkers at lightning speed. And if working in data is all about uncovering the truth, then Data Heads want to do just that.
What does it mean, then, when they sit down to a project that doesn't whet their appetite? What does it mean for them to have to work on a poorly defined issue where their skills become bragging rights for executives but don't actually solve meaningful problems?
It means many data workers are dissatisfied at their jobs. Having them work on problems overly focused on technology with ambiguous outcomes leads to frustration and disillusionment. Kaggle.com, where data scientists from all over the world compete in data science competitions and learn new analysis methods, posted a survey and asked data scientists what barriers they face at work. 2 Several of the barriers, listed here, are directly related to poorly defined problems and improper planning:
- Lack of clear question to answer (30.4% of respondents experienced this)
- Results not used by decision makers (24.3%)
- Lack of domain expert input (19.6%)
- Expectations of project impact (15.8%)
- Integrating findings into decisions (13.6%)
This has obvious consequences. Those who aren't satisfied in their roles leave.
CHAPTER SUMMARY
The very premise and structure of this book is to teach you to ask more probing questions. It starts with the most important, and sometimes hardest, question: “What's the problem?”
In this chapter, you learned ways to refine and clarify the central business question and why problems involving data and analysis are particularly challenging. We shared five important questions a Data Head should ask when defining a problem. You also learned about early warning signs to spot when a question starts to go off track. If the question hints of having a (1) methodology focus or a (2) deliverable focus, it's time to hit pause.
When these questions are answered, you are ready to get to work.
Notes
- 1 A robust data strategy can help companies mitigate these issues. Of course, an important component of any data strategy is to solve meaningful problems, and that's our focus in this chapter. If you'd like to learn more about high-level data strategy, see Jagare, U. (2019). Data science strategy for dummies. John Wiley & Sons.
- 2 2017 Kaggle Machine Learning & Data Science Survey. Data is available at www.kaggle.com/kaggle/kaggle-survey-2017. Accessed on January 12, 2021.