Business Analytics. Principles, Concepts And Applications

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Business Analytics. Principles, Concepts And Applications as PDF for free.

More details

About This eBook ePUB is an open, industry-standard format for eBooks. However, support of ePUB and its many features varies across reading devices and applications. Use your device or app settings to customize the presentation to your liking. Settings that you can customize often include font, font size, single or double column, landscape or portrait mode, and figures that you can click or tap to enlarge. For additional information about the settings and features on your reading device or app, visit the device manufacturer’s Web site. Many titles include programming code or configuration examples. To optimize the presentation of these elements, view the eBook in singlecolumn, landscape mode and adjust the font size to the smallest setting. In addition to presenting code and configurations in the reflowable text format, we have included images of the code that mimic the presentation found in the print book; therefore, where the reflowable format may compromise the presentation of the code listing, you will see a “Click here to view code image” link. Click the link to view the print-fidelity code image. To return to the previous page viewed, click the Back button on your device or app.

Business Analytics Principles, Concepts, and Applications What, Why, and How Marc J. Schniederjans Dara G. Schniederjans Christopher M. Starkey Pearson

Associate Publisher: Amy Neidlinger Executive Editor: Jeanne Glasser Levine Operations Specialist: Jodi Kemper Cover Designer: Alan Clements Cover Image: Alan McHugh Managing Editor: Kristy Hart Senior Project Editor: Lori Lyons Copy Editor: Gill Editorial Services Proofreader: Katie Matejka Indexer: Erika Millen Senior Compositor: Gloria Schurick Manufacturing Buyer: Dan Uhrig © 2014 by Marc J. Schniederjans, Dara G. Schniederjans, and Christopher M. Starkey Pearson Education, Inc. Upper Saddle River, New Jersey 07458 For information about buying this title in bulk quantities, or for special sales opportunities (which may include electronic versions; custom cover designs; and content particular to your business, training goals, marketing focus, or branding interests), please contact our corporate sales department at [email protected] or (800) 382-3419. For government sales inquiries, please contact [email protected]. For questions about sales outside the U.S., please contact [email protected]. Company and product names mentioned herein are the trademarks or registered trademarks of their respective owners. All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Printed in the United States of America First Printing: April 2014 ISBN-10: 0-13-355218-7 ISBN-13: 978-0-13-355218-8 Pearson Education LTD. Pearson Education Australia PTY, Limited. Pearson Education Singapore, Pte. Ltd. Pearson Education Asia, Ltd.

Pearson Education Canada, Ltd. Pearson Educación de Mexico, S.A. de C.V. Pearson Education—Japan Pearson Education Malaysia, Pte. Ltd. Library of Congress Control Number: 2014931049

This book is dedicated to Miles Starkey. He is what brings purpose to our lives and gives us a future.

Contents-at-a-Glance Preface PART I: What Are Business Analytics Chapter 1: What Are Business Analytics? PART II: Why Are Business Analytics Important Chapter 2: Why Are Business Analytics Important? Chapter 3: What Resource Considerations Are Important to Support Business Analytics? PART III: How Can Business Analytics Be Applied Chapter 4: How Do We Align Resources to Support Business Analytics within an Organization? Chapter 5: What Are Descriptive Analytics? Chapter 6: What Are Predictive Analytics? Chapter 7: What Are Prescriptive Analytics? Chapter 8: A Final Case Study Illustration PART IV: Appendixes A: Statistical Tools B: Linear Programming C: Duality and Sensitivity Analysis in Linear Programming D: Integer Programming E: Forecasting F: Simulation G: Decision Theory Index

Table of Contents Preface PART I: What Are Business Analytics Chapter 1: What Are Business Analytics? 1.1 Terminology 1.2 Business Analytics Process 1.3 Relationship of BA Process and Organization Decision-Making Process 1.4 Organization of This Book Summary Discussion Questions References PART II: Why Are Business Analytics Important Chapter 2: Why Are Business Analytics Important? 2.1 Introduction 2.2 Why BA Is Important: Providing Answers to Questions 2.3 Why BA Is Important: Strategy for Competitive Advantage 2.4 Other Reasons Why BA Is Important 2.4.1 Applied Reasons Why BA Is Important 2.4.2 The Importance of BA with New Sources of Data Summary Discussion Questions References Chapter 3: What Resource Considerations Are Important to Support Business Analytics? 3.1 Introduction 3.2 Business Analytics Personnel 3.3 Business Analytics Data 3.3.1 Categorizing Data 3.3.2 Data Issues 3.4 Business Analytics Technology Summary Discussion Questions References PART III: How Can Business Analytics Be Applied

Chapter 4: How Do We Align Resources to Support Business Analytics within an Organization? 4.1 Organization Structures Aligning Business Analytics 4.1.1 Organization Structures 4.1.2 Teams 4.2 Management Issues 4.2.1 Establishing an Information Policy 4.2.2 Outsourcing Business Analytics 4.2.3 Ensuring Data Quality 4.2.4 Measuring Business Analytics Contribution 4.2.5 Managing Change Summary Discussion Questions References Chapter 5: What Are Descriptive Analytics? 5.1 Introduction 5.2 Visualizing and Exploring Data 5.3 Descriptive Statistics 5.4 Sampling and Estimation 5.4.1 Sampling Methods 5.4.2 Sampling Estimation 5.5 Introduction to Probability Distributions 5.6 Marketing/Planning Case Study Example: Descriptive Analytics Step in the BA Process 5.6.1 Case Study Background 5.6.2 Descriptive Analytics Analysis Summary Discussion Questions Problems Chapter 6: What Are Predictive Analytics? 6.1 Introduction 6.2 Predictive Modeling 6.2.1 Logic-Driven Models 6.2.2 Data-Driven Models 6.3 Data Mining 6.3.1 A Simple Illustration of Data Mining 6.3.2 Data Mining Methodologies

6.4 Continuation of Marketing/Planning Case Study Example: Prescriptive Analytics Step in the BA Process 6.4.1 Case Study Background Review 6.4.2 Predictive Analytics Analysis Summary Discussion Questions Problems References Chapter 7: What Are Prescriptive Analytics? 7.1 Introduction 7.2 Prescriptive Modeling 7.3 Nonlinear Optimization 7.4 Continuation of Marketing/Planning Case Study Example: Prescriptive Step in the BA Analysis 7.4.1 Case Background Review 7.4.2 Prescriptive Analysis Summary Addendum Discussion Questions Problems References Chapter 8: A Final Business Analytics Case Problem 8.1 Introduction 8.2 Case Study: Problem Background and Data 8.3 Descriptive Analytics Analysis 8.4 Predictive Analytics Analysis 8.4.1 Developing the Forecasting Models 8.4.2 Validating the Forecasting Models 8.4.3 Resulting Warehouse Customer Demand Forecasts 8.5 Prescriptive Analytics Analysis 8.5.1 Selecting and Developing an Optimization Shipping Model 8.5.2 Determining the Optimal Shipping Schedule 8.5.3 Summary of BA Procedure for the Manufacturer 8.5.4 Demonstrating Business Performance Improvement Summary Discussion Questions Problems

PART IV: Appendixes A: Statistical Tools A.1 Introduction A.2 Counting A.3 Probability Concepts A.4 Probability Distributions A.5 Statistical Testing B: Linear Programming B.1 Introduction B.2 Types of Linear Programming Problems/Models B.3 Linear Programming Problem/Model Elements B.4 Linear Programming Problem/Model Formulation Procedure B.5 Computer-Based Solutions for Linear Programming Using the Simplex Method B.6 Linear Programming Complications B.7 Necessary Assumptions for Linear Programming Models B.8 Linear Programming Practice Problems C: Duality and Sensitivity Analysis in Linear Programming C.1 Introduction C.2 What Is Duality? C.3 Duality and Sensitivity Analysis Problems C.4 Determining the Economic Value of a Resource with Duality C.5 Duality Practice Problems D: Integer Programming D.1 Introduction D.2 Solving IP Problems/Models D.3 Solving Zero-One Programming Problems/Models D.4 Integer Programming Practice Problems E: Forecasting E.1 Introduction E.2 Types of Variation in Time Series Data E.3 Simple Regression Model E.4 Multiple Regression Models E.5 Simple Exponential Smoothing E.6 Smoothing Averages E.7 Fitting Models to Data E.8 How to Select Models and Parameters for Models

E.9 Forecasting Practice Problems F: Simulation F.1 Introduction F.2 Types of Simulation F.3 Simulation Practice Problems G: Decision Theory G.1 Introduction G.2 Decision Theory Model Elements G.3 Types of Decision Environments G.4 Decision Theory Formulation G.5 Decision-Making Under Certainty G.6 Decision-Making Under Risk G.7 Decision-Making under Uncertainty G.8 Expected Value of Perfect Information G.9 Sequential Decisions and Decision Trees G.10 The Value of Imperfect Information: Bayes’s Theorem G.11 Decision Theory Practice Problems Index

About the Authors

Marc J. Schniederjans is the C. Wheaton Battey Distinguished Professor of Business in the College of Business Administration at the University of Nebraska-Lincoln and has served on the faculty of three other universities. Professor Schniederjans is a Fellow of the Decision Sciences Institute (DSI) and in 2014–2015 will serve as DSI’s President. His prior experience includes owning and operating his own truck leasing business. He is currently a member of the Institute of Supply Management (ISM), the Production and Operations Management Society (POMS), and Decision Sciences Institute (DSI). Professor Schniederjans has taught extensively in operations management and management science. He has won numerous teaching awards and is an honorary member of the Golden Key honor society and the Alpha Kappa Psi business honor society. He has published more than a hundred journal articles and has authored or coauthored twenty books in the field of management. The title of his most recent book is Reinventing the Supply Chain Life Cycle, and his research has encompassed a wide range of operations management and decision science topics. He has also presented more than one hundred research papers at academic meetings. Professor Schniederjans is serving on five journal editorial review boards, including Computers & Operations Research, International Journal of Information & Decision Sciences, International Journal of Information Systems in the Service Sector, Journal of Operations Management, Production and Operations Management. He is also serving as an area editor for the journal Operations Management Research and as an associate editor for the International Journal of Strategic Decision Sciences and International Journal of the Society Systems Science and Management Review: An International Journal (Korea). Professor Schniederjans has also served as a consultant and trainer to various business and government agencies.

Dara G. Schniederjans is an assistant professor of Supply Chain Management at the University of Rhode Island, College of Business Administration. She has published articles in journals such as Decision Support Systems, Journal of the Operational Research Society, and Business Process Management Journal. She has also coauthored two text books and coedited a readings book. She has contributed chapters to readings utilizing quantitative and statistical methods. Dara has served as a guest coeditor for a special issue on Business Ethics in Social Sciences in the International Journal of Society Systems Science. She has also served as a website coordinator for Decisions Sciences Institute. She currently teaches courses in Supplier Relationship Management and Operations Management.

Christopher M. Starkey is an economics student at the University of Connecticut-Storrs. He has presented papers at the Academy of Management and Production and Operations Management Society meetings. He currently teaches courses in Principles of Microeconomics and has taught Principles of Macroeconomics. His current research interests include macroeconomic and monetary policy, as well as other decision-making methodologies.

Preface Like the face on the cover of this book, we are bombarded by information every day. We do our best to sort out and use the information to help us get by, but sometimes we are overwhelmed by the abundance of data. This can lead us to draw wrong conclusions and make bad decisions. When you are a global firm collecting millions of transactions and customer behavior data from all over the world, the size of the data alone can make the task of finding useful information about customers almost impossible. For that firm and even smaller businesses, the solution is to apply business analytics (BA). BA helps sort out large data files (called “big data”), find patterns of behavior useful in predicting the future, and allocate resources to optimize decision-making. BA involves a step-wise process that aids firms in managing big data in a systematic procedure to glean useful information, which can solve problems and pinpoint opportunities for enhanced business performance. This book has been written to provide a basic education in BA that can serve both academic and practitioner markets. In addition to bringing BA upto-date with literature and research, this book explains the BA process in simple terms and supporting methodologies useful in its application. Collectively, the statistical and quantitative tools presented in this book do not need substantial prerequisites other than basic high school algebra. To support both markets, a substantial number of solved problems are presented along with some case study applications to train readers in the use of common BA tools and software. Practitioners will find the treatment of BA methodologies useful review topics. Academic users will find chapter objectives and discussion questions helpful for serving their needs while also having an opportunity to obtain an Instructor’s Guide with chapter-end problem solutions and exam questions. The purpose of this book is to explain what BA is, why it is important to know, and how to do it. To achieve this purpose, the book presents conceptual content, software familiarity, and some analytic tools.

Conceptual Content The conceptual material is presented in the first eight chapters of the book. (See Section 1.4 in Chapter 1 for an explanation of the book’s organization.) The conceptual content covers much more than what BA is about. The book explains why BA is important in terms of proving answers to questions, how it can be used to achieve competitive advantage, and how to align an organization to make best use of it. The book explains the managerial aspects of creating a BA presence in an organization and the skills BA personnel are expected to possess. The book also describes data management issues such as data collection, outsourcing, data quality, and change management as they relate to BA. Having created a managerial foundation explaining “what” and “why” BA is important, the remaining chapters focus on “how” to do it. Embodied in a three-step process, BA is explained to have descriptive, predictive, and prescriptive analytic steps. For each of these steps, this book presents a series of strategies and best practice guides to aid in the BA process.

Software Much of what BA is about involves the use of software. Unfortunately, no single software covers all aspects of BA. Many institutions prefer one type of software over others. To provide flexibility, this book’s use of software provides some options and can be used by readers who are not even interested in running computer software. In this book, SPSS®, Excel®, and Lingo® software are utilized to model and solve problems. The software treatment is mainly the output of these software systems, although some input and instructions on their use is provided. For those not interested in running software applications, the exposure to the printouts provides insight into their informational value. This book recognizes that academic curriculums prefer to uniquely train students in the use of software and does not duplicate basic software usage. As a prerequisite to using this book, it is recommended that those interested in running software applications for BA become familiar with and are instructed on the use of whatever software is desired.

Analytic Tools The analytic tool materials are chiefly contained in this book’s appendixes. BA is a statistical, management information systems (MIS) and quantitative methods tools-oriented subject. While the conceptual content in the book overviews how to undertake the BA process, the implementation of how to actually do BA requires quantitative tools. Because some practitioners and academic programs are less interested in the technical aspects of BA, the bulk of the quantitative material is presented in the appendixes. These appendixes provide an explanation and illustration of a substantial body of BA tools to support a variety of analyses. Some of the statistical tools that are explained and illustrated in this book include statistical counting (permutations, combinations, repetitions), probability concepts (approaches to probability, rules of addition, rules of multiplication, Bayes’ Theorem), probability distributions (binomial, Poisson, normal, exponential), confidence intervals, sampling methods, simple and multiple regression, charting, and hypothesis testing. Although management information systems are beyond the scope of this book, the software applications previously mentioned are utilized to illustrate search, clustering, and typical data mining applications of MIS technology. In addition, quantitative methods tools explained and illustrated in this book include linear programming, duality and sensitivity analysis, integer programming, zero-one programming, forecasting modeling, nonlinear optimization, simulation analysis, breakeven analysis, and decision theory (certainty, risk, uncertainty, expected value opportunity loss analysis, expected value of perfect information, expected value of imperfect information). We want to acknowledge the help of individuals who provided needed support for the creation of this book. First, we really appreciate the support of our editor, Jeanne Glasser Levine, and the outstanding staff at Financial Times Press/Pearson. They made creating this book a pleasure and worked with us to improve the final product. Decades of writing books with other publishers permitted us to recognize how a top-tier publisher like ours makes a difference. We thank Alan McHugh, who developed the image on our book cover. His constant willingness to explore and be innovative with ideas made a significant contribution to our book. We also want to acknowledge the great editing help we received from Jill Schniederjans. Her skill has reduced the wordiness and enhanced the content (making parts less boring to

read). Finally, we would like to acknowledge the help of Miles Starkey, whose presence and charm have lifted our spirits and kept us on track to meet completion deadlines. While many people have assisted in preparing this book, its accuracy and completeness are our responsibility. For all errors that this book may contain, we apologize in advance. Marc J. Schniederjans Dara G. Schniederjans Christopher M. Starkey

Part I: What Are Business Analytics Chapter 1 What Are Business Analytics?

1. What Are Business Analytics? Chapter objectives: • Define business analytics. • Explain the relationship of analytics and business intelligence to the subject of business analytics. • Describe the three steps of the business analytics process. • Describe four data classification measurement scales. • Explain the relationship of the business analytics process with the organization decision-making process.

1.1. Terminology Business analytics begins with a data set (a simple collection of data or a data file) or commonly with a database (a collection of data files that contain information on people, locations, and so on). As databases grow, they need to be stored somewhere. Technologies such as computer clouds (hardware and software used for data remote storage, retrieval, and computational functions) and data warehousing (a collection of databases used for reporting and data analysis) store data. Database storage areas have become so large that a new term was devised to describe them. Big data describes the collection of data sets that are so large and complex that software systems are hardly able to process them (Isson and Harriott, 2013, pp. 57–61). Isson and Harriott (2013, p. 61) define little data as anything that is not big data. Little data describes the smaller data segments or files that help individual businesses keep track of customers. As a means of sorting through data to find useful information, the application of analytics has found new purpose. Three terms in business literature are often related to one another: analytics, business analytics, and business intelligence. Analytics can be defined as a process that involves the use of statistical techniques (measures of central tendency, graphs, and so on), information system software (data mining, sorting routines), and operations research methodologies (linear programming) to explore, visualize, discover and communicate patterns or trends in data. Simply, analytics convert data into useful information. Analytics is an older term commonly applied to all disciplines, not just business. A typical example of the use of analytics is the weather measurements collected and converted into statistics, which in turn predict weather patterns.

There are many types of analytics, and there is a need to organize these types to understand their uses. We will adopt the three categories (descriptive, predictive, and prescriptive) that the Institute of Operations Research and Management Sciences (INFORMS) organization (www.informs.org) suggests for grouping the types of analytics (see Table 1.1). These types of analytics can be viewed independently. For example, some firms may only use descriptive analytics to provide information on decisions they face. Others may use a combination of analytic types to glean insightful information needed to plan and make decisions.

Table 1.1 Types of Analytics The purposes and methodologies used for each of the three types of analytics differ, as can be seen in Table 1.2. It is these differences that distinguish analytics from business analytics. Whereas analytics is focused on generating insightful information from data sources, business analytics goes the extra step to leverage analytics to create an improvement in measurable business performance. Whereas the process of analytics can involve any one of the three types of analytics, the major components of business analytics include all three used in combination to generate new, unique, and valuable information that can aid business organization decisionmaking. In addition, the three types of analytics are applied sequentially (descriptive, then predictive, then prescriptive). Therefore, business analytics (BA) can be defined as a process beginning with business-related

data collection and consisting of sequential application of descriptive, predictive, and prescriptive major analytic components, the outcome of which supports and demonstrates business decision-making and organizational performance. Stubbs (2011, p. 11) believes that BA goes beyond plain analytics, requiring a clear relevancy to business, a resulting insight that will be implementable, and performance and value measurement to ensure a successful business result.

Table 1.2 Analytic Purposes and Tools Business intelligence (BI) can be defined as a set of processes and technologies that convert data into meaningful and useful information for business purposes. While some believe that BI is a broad subject that encompasses analytics, business analytics, and information systems (Bartlett, 2013, p.4), others believe it is mainly focused on collecting, storing, and exploring large database organizations for information useful to decisionmaking and planning (Negash, 2004). One function that is generally accepted as a major component of BI involves storing an organization’s data in computer cloud storage or in data warehouses. Data warehousing is not an analytics or business analytics function, although the data can be used for analysis. In application, BI is focused on querying and reporting, but it can include reported information from a BA analysis. BI seeks to answer questions such as what is happening now and where, and also what business actions are needed based on prior experience. BA, on the other hand, can answer questions like why something is happening, what new trends may exist, what will happen next, and what is the best course for the future.

In summary, BA includes the same procedures as in plain analytics but has the additional requirement that the outcome of the analytic analysis must make a measurable impact on business performance. BA includes reporting results like BI but seeks to explain why the results occur based on the analysis rather than just reporting and storing the results, as is the case with BI. Analytics, BA, and BI will be mentioned throughout this book. A review of characteristics to help differentiate these terms is presented in Table 1.3.

Table 1.3 Characteristics of Analytics, Business Analytics, and Business Intelligence

1.2. Business Analytics Process The complete business analytic process involves the three major component steps applied sequentially to a source of data (see Figure 1.1). The outcome of the business analytic process must relate to business and seek to improve business performance in some way.

Figure 1.1 Business analytic process The logic of the BA process in Figure 1.1 is initially based on a question: What valuable or problem-solving information is locked up in the sources of data that an organization has available? At each of the three steps that make up the BA process, additional questions need to be answered, as shown in Figure 1.1. Answering all these questions requires mining the information out of the data via the three steps of analysis that comprise the BA process. The analogy of digging in a mine is appropriate for the BA process because finding new, unique, and valuable information that can lead to a successful strategy is just as good as finding gold in a mine. SAS, a major analytic corporation (www.sas.com), actually has a step in its BA process, Query Drilldown, which refers to the mining effort of questioning and finding answers to pull up useful information in the BA analysis. Many firms routinely undertake BA to solve specific problems, while other firms undertake BA to explore and discover new knowledge to guide organizational planning and decision-making to improve business performance.

The size of some data sources can be unmanageable, overly complex, and generally confusing. Sorting out data and trying to make sense of its informational value requires the application of descriptive analytics as a first step in the BA process. One might begin simply by sorting the data into groups using the four possible classifications presented in Table 1.4. Also, incorporating some of the data into spreadsheets like Excel and preparing cross tabulations and contingency tables are means of restricting the data into a more manageable data structure. Simple measures of central tendency and dispersion might be computed to try to capture possible opportunities for business improvement. Other descriptive analytic summarization methods, including charting, plotting, and graphing, can help decision makers visualize the data to better understand content opportunities.

Table 1.4 Types of Data Measurement Classification Scales From Step 1 in the Descriptive Analytic analysis (see Figure 1.1), some patterns or variables of business behavior should be identified representing targets of business opportunities and possible (but not yet defined) future trend behavior. Additional effort (more mining) might be required, such as the generation of detailed statistical reports narrowly focused on the data related to targets of business opportunities to explain what is taking place in the data (what happened in the past). This is like a statistical search for

predictive variables in data that may lead to patterns of behavior a firm might take advantage of if the patterns of behavior occur in the future. For example, a firm might find in its general sales information that during economic downtimes, certain products are sold to customers of a particular income level if certain advertising is undertaken. The sales, customers, and advertising variables may be in the form of any of the measurable scales for data in Table 1.4, but they have to meet the three conditions of BA previously mentioned: clear relevancy to business, an implementable resulting insight, and performance and value measurement capabilities. To determine whether observed trends and behavior found in the relationships of the descriptive analysis of Step 1 actually exist or hold true and can be used to forecast or predict the future, more advanced analysis is undertaken in Step 2, Predictive Analytic analysis, of the BA process. There are many methods that can be used in this step of the BA process. A commonly used methodology is multiple regression. (See Appendix A, “Statistical Tools,” and Appendix E, “Forecasting,” for a discussion on multiple regression and ANOVA testing.) This methodology is ideal for establishing whether a statistical relationship exists between the predictive variables found in the descriptive analysis. The relationship might be to show that a dependent variable is predictively associated with business value or performance of some kind. For example, a firm might want to determine which of several promotion efforts (independent variables measured and represented in the model by dollars in TV ads, radio ads, personal selling, and/or magazine ads) is most efficient in generating customer sale dollars (the dependent variable and a measure of business performance). Care would have to be taken to ensure the multiple regression model was used in a valid and reliable way, which is why ANOVA and other statistical confirmatory analyses are used to support the model development. Exploring a database using advanced statistical procedures to verify and confirm the best predictive variables is an important part of this step in the BA process. This answers the questions of what is currently happening and why it happened between the variables in the model. A single or multiple regression model can often forecast a trend line into the future. When regression is not practical, other forecasting methods (exponential smoothing, smoothing averages) can be applied as predictive analytics to develop needed forecasts of business trends. (See Appendix E.) The identification of future trends is the main output of Step 2 and the

predictive analytics used to find them. This helps answer the question of what will happen. If a firm knows where the future lies by forecasting trends as they would in Step 2 of the BA process, it can then take advantage of any possible opportunities predicted in that future state. In Step 3, Prescriptive Analytics analysis, operations research methodologies can be used to optimally allocate a firm’s limited resources to take best advantage of the opportunities it found in the predicted future trends. Limits on human, technology, and financial resources prevent any firm from going after all opportunities they may have available at any one time. Using prescriptive analytics allows the firm to allocate limited resources to optimally achieve objectives as fully as possible. For example, linear programming (a constrained optimization methodology) has been used to maximize the profit in the design of supply chains (Paksoy et al., 2013). (Note: Linear programming and other optimization methods are presented in Appendixes B, “Linear Programming,” C, “Duality and Sensitivity Analysis in Linear Programming,” and D, “Integer Programming.”) This third step in the BA process answers the question of how best to allocate and manage decision-making in the future. In summary, the three major components of descriptive, predictive, and prescriptive analytics arranged as steps in the BA process can help a firm find opportunities in data, predict trends that forecast future opportunities, and aid in selecting a course of action that optimizes the firm’s allocation of resources to maximize value and performance. The BA process, along with various methodologies, will be detailed in Chapters 5 through 10.

1.3. Relationship of BA Process and Organization DecisionMaking Process The BA process can solve problems and identify opportunities to improve business performance. In the process, organizations may also determine strategies to guide operations and help achieve competitive advantages. Typically, solving problems and identifying strategic opportunities to follow are organization decision-making tasks. The latter, identifying opportunities, can be viewed as a problem of strategy choice requiring a solution. It should come as no surprise that the BA process described in Section 1.2 closely parallels classic organization decision-making processes. As depicted in Figure 1.2, the business analytic process has an inherent relationship to the steps in typical organization decision-making processes.

Figure 1.2 Comparison of business analytics and organization decisionmaking processes The organization decision-making process (ODMP) developed by Elbing (1970) and presented in Figure 1.2 is focused on decision making to solve problems but could also be applied to finding opportunities in data and deciding what is the best course of action to take advantage of them. The five-step ODMP begins with the perception of disequilibrium, or the awareness that a problem exists that needs a decision. Similarly, in the BA process, the first step is to recognize that databases may contain information that could both solve problems and find opportunities to improve business performance. Then in Step 2 of the ODMP, an exploration of the problem to determine its size, impact, and other factors is undertaken to diagnose what the problem is. Likewise, the BA descriptive analytic analysis explores factors that might prove useful in solving problems and offering

opportunities. The ODMP problem statement step is similarly structured to the BA predictive analysis to find strategies, paths, or trends that clearly define a problem or opportunity for an organization to solve problems. Finally, the ODMP’s last steps of strategy selection and implementation involve the same kinds of tasks that the BA process requires in the final prescriptive step (make an optimal selection of resource allocations that can be implemented for the betterment of the organization). The decision-making foundation that has served ODMP for many decades parallels the BA process. The same logic serves both processes and supports organization decision-making skills and capacities.

1.4. Organization of This Book This book is designed to answer three questions about BA: • What is it? • Why is it important? • How do you do it? To answer these three questions, the book is divided into three parts. In Part I, “What Are Business Analytics?”, Chapter 1 answers the “what” question. In Part II, the “why” question is answered in Chapter 2, “Why Are Business Analytics Important?” and Chapter 3, “What Resource Considerations Are Important to Support Business Analytics?” Knowing the importance of explaining how BA is undertaken, the rest of the book’s chapters and appendixes are devoted to answering that question. Chapter 4, “How Do We Align Resources to Support Business Analytics within an Organization?”, explains how an organization needs to support BA. Chapter 5, “What Are Descriptive Analytics?”, Chapter 6, “What Are Predictive Analytics?”, and Chapter 7, “What Are Prescriptive Analytics?”, detail and illustrate the three respective steps in the BA process. To further illustrate how to conduct a BA analysis, Chapter 8, “A Final Case Study Illustration,” provides an example of BA. Supporting the analytic discussions is a series of analytically oriented appendixes that follow Chapter 8. Part III includes quantitative analyses utilizing computer software. In an effort to provide some diversity of software usage, SPSS, Excel, and LINGO software output are presented. SPSS and LINGO can be used together to duplicate the analysis in this book, or only Excel with the necessary add-ins can be used. Because of the changing nature of software and differing educational backgrounds, this book does not provide extensive software explanation.

In addition to the basic content that makes up the body of the chapters, there are pedagogy enhancements that can aid learning. All chapters begin with chapter objectives and end with a summary, discussion questions, and, where needed, references. In addition, Chapters 5 through 8 have sample problems with solutions, as well as additional assignment problems. Some of the more detailed explanations of methodologies are presented in the appendixes. Their positioning in the appendixes is designed to enhance content flow and permit more experienced readers a flexible way to select only the technical content they might want to use. An extensive index allows quick access to terminology.

Summary This chapter has introduced important terminology and defined business analytics in terms of a unique process useful in securing information on which decisions can be made and business opportunities seized. Data classification measurement scales were also briefly introduced to aid in understanding the types of measures that can be employed in BA. The relationship of the BA process and the organization decision-making process was explained in terms of how they complement each other. This chapter ended with a brief overview of this book’s organization and how it is structured to aid learning. Knowing what business analytics are about is important, but equally important is knowing why they are important. Chapter 2 begins to answer the question.

Discussion Questions 1. What is the difference between analytics and business analytics? 2. What is the difference between business analytics and business intelligence? 3. Why are the steps in the business analytics process sequential? 4. How is the business analytics process similar to the organization decision-making process? 5. Why does interval data have to be relationally proportional?

References Bartlett, R. (2013) A Practitioner’s Guide to Business Analytics. McGraw-Hill, New York, NY. Elbing, A.O. (1970) Behavioral Decisions in Organizations. Scott Foresman and Company, Glenview, IL.

Isson, J.P., Harriott, J.S. (2013) Win with Advanced Business Analytics. John Wiley & Sons, Hoboken, NJ. Negash, S. (2004) “Business Intelligence.” Communications of the Association of Information Systems. Vol. 13, pp. 177–195. Paksoy, T., Ozxeylan, E., Weber, G.W. (2013) “Profit-Oriented Supply Chain Network Optimization.” Central European Journal of Operations Research. Vol. 21, No. 2, pp. 455–478. Stubbs, E. (2011) The Value of Business Analytics. John Wiley & Sons, Hoboken, NJ.

Part II: Why Are Business Analytics Important Chapter 2 Why Are Business Analytics Important? Chapter 3 What Resource Considerations are Important to Support Business Analytics?

2. Why Are Business Analytics Important? Chapter objectives: • Explain why business analytics are important in solving business problems. • Explain why business analytics are important in identifying new business initiatives. • Describe the kinds of questions business analytics can help answer. • Explain how business analytics can help an organization achieve a competitive advantage. • Explain different types of competitive advantages and their relationship to business analytics. • Explain the importance of business analytics for a business organization.

2.1. Introduction Telecommunication and information systems are collecting data on every aspect of life with incredible rates of speed and comprehensiveness. In addition, businesses are running opinion surveys and collecting all forms of data for their operations. With information system clouds providing large amounts of data that are easily available and data warehousing systems capable of storing big data in large databases, there is presently a need to process information out of data to gain knowledge and justify data investment. As Demirkan and Delen (2013) have shown, placing large data into computer clouds can provide business analytics in a timely and agile way. Firms recognize the need for this information to be competitive, and business analytics is one strategy to gain the knowledge they seek. The problem with big data or even small data files is that they can easily obscure the information desired. Sometimes a small alteration in a piece of data located in a file can change meanings. The 1960s television program The Prisoner used the catch phrase, “I want information.” When this phrase is seen in print or spoken, it denotes that someone wants information. Yet when the term was used in The Prisoner, it referred to “in” and “formation.” (That is, “I want in formation.”) The phrase was used to make the prisoner do what he was told and act like the others. Note that a single space in this second phrase completely changes the meaning. Mining for relevant business information in big databases when small differences can alter meanings

makes it a challenge to find relevant and useful information. Business analytics as a process is designed to meet this challenge.

2.2. Why BA Is Important: Providing Answers to Questions It may seem overly virtuous, but BA is the next best thing to a crystal ball for answering important business questions. In each of the three steps of the BA process (from Chapter 1, “What Are Business Analytics?”), answers to a variety of questions can and should be answered as a logical outcome of the analysis. The answers become the basis of information and knowledge that makes BA a valued tool for decision-making and helps explain why it is important to learn and use. As can be seen in Table 2.1, the sampling of the kinds of questions a typical BA analysis can render is related not only to each step in the BA process, but to the context of time. To better understand the value of the information BA analysis provides and understand why this subject is important to improved business performance, a simple illustrative case scenario is presented.

*Source: Adapted from Exhibit 9.1 from Isson and Harriott (2013), p. 169. Table 2.1 Questions Business Analytics Seeks to Answer* In this illustrative case scenario a local credit union offers a series of packaged homeowner loans that are periodically marketed by running a promotional campaign in a variety of media (print ads, television commercials, radio spots). The idea is to bring in new customers to make home loans that fit one of the packaged deals. Halfway through the marketing program, the credit union does not know if the business generated is due to the promotional campaign or just a result of their normal business cycle. To

clarify, the credit union undertakes a BA analysis. The resulting information from the BA analysis (based on the same questions as in Table 2.1) is presented in Table 2.2. Reading first the Descriptive step, Past, Present, and Future, and then sequentially following the same pattern with the Predictive and Prescriptive steps, the possible types of information gleaned from these BA questions and answers can be illustrated by this example.

Table 2.2 Credit Union Example of BA Analysis Information The answers to the questions raised in the credit union example are typical of any business organization problem-solving or opportunity-seeking quest. The answers were not obtained by just using statistics, computer search routines, or operations research methodologies, but rather were a result of a sequential BA process. The informational value of the answers in this scenario suggests a measurable and precise course of action for the management of the credit union to follow. By continuously applying BA as a

decision support system, firms have come to see not only why they need BA, but also how BA can become a strategy to achieve competitive advantage. Kiron et al. (2012) reported in a survey on business through the year 2012 that firms applying business analytics permit the organization to have better access to data for decision-making and offer a competitive advantage.

2.3. Why BA Is Important: Strategy for Competitive Advantage Companies that make plans that generate successful outcomes are winners in the marketplace. Companies that do not effectively plan tend to be losers in the marketplace. Planning is a critical part of running any business. If it is done right, it obtains the results that the planners desire. Business organization planning is typically segmented into three types, presented in Figure 2.1. The planning process usually follows a sequence from strategy, down to tactical, and then down to operational, although Figure 2.1 shows arrows of activities going up and down the depicted hierarchal structure of most business organizations. The upward flow in Figure 2.1 represents the information passed from lower levels up, and the downward flow represents the orders that are passed from higher levels of management down to lower levels for implementation. It can be seen in the Teece (2007) study and more recently in Rha (2013) that the three steps in the BA process and strategic planning embody the same efforts and steps.

*Source: Adapted from Figure 1.2 in Schniederjans and LeGrand (2013), p.9. Figure 2.1 Types of organization planning* Effectively planning and passing down the right orders in hopes of being a business winner requires good information on which orders can be decided. Some information can become so valuable that it provides the firm a competitive advantage (the ability of one business to perform at a higher level, staying ahead of present competitors in the same industry or market). Business analytics can support all three types of planning with useful information that can give a firm a competitive advantage. Examples of the ways BA can help firms achieve a competitive advantage are presented in Table 2.3.

Table 2.3 Ways BA Can Help Achieve a Competitive Advantage

2.4. Other Reasons Why BA Is Important There is an almost endless list of potential applications of BA to provide information on which decisions can be made or improved.

2.4.1. Applied Reasons Why BA Is Important Some potential applications for decision-making will be illustrated in later chapters. Several brief examples are described in Table 2.4.

Table 2.4 Applications of BA to Enhance Decision-Making

2.4.2. The Importance of BA with New Sources of Data As advances in new computer and telecommunication technologies take place, they provide new types of data. Therefore, new types of analytics need to be applied in BA analyses. Digital analytics is a term that describes any source of data that is conveyed using digital sources. Examples of these new sources of data-based analytics include text analytics and unstructured data analytics. Text analytics can be defined as a set of linguistic, statistical, and computer-based techniques that model and structure the information content from textual sources (Basole, et al., 2013). It is a search process in databases to find patterned text material that provides useful information. Also referred to as text data mining, text analytics uses data mining software to look into databases to find and validate the kinds of information on which predictions can be made. Being able to search and quantify textual data using text analytics opens great opportunities to glean information about customers and markets based on technology-driven data collection technologies. One example of technology-driven data is social media data. Social media can be defined as interactions or communications among people or communities, usually performed on a technology platform, involving the sharing, creating, discussing, and modifying of communicated verbal or electronic content. Two global social platforms are Twitter and Facebook. The methodologies or technologies used in the purveyance of social media data can include any means of distribution of verbal or other types of communications, including, but not limited to, photographs or pictures, video, Internet forums, web logs, discussion forums, social blogs, wikis, social networks, and podcasts. These sources of data are the basis of social media analytics, on which the analytics information can aid in learning new types of social media behavior and information. They provide a great challenge for BA analysts because of the excessive volume and difficulty in quantifying the information in useful ways. They also provide a great opportunity to find information that might create a competitive advantage. An example of how social media analytics helped find auto defects was illustrated in a study by Abrahams et al. (2012). By employing text mining on a social medium (online discussion forums) used by vehicle enthusiasts, a variety of quality defects were identified, categorized, and prioritized for automobile manufacturers to correct. Another similar digital source of analytics is referred to as mobile analytics. Mobile analytics can be defined as any data secured from mobile

devices, such as smartphones, iPhones, iPads, and Web browsers. These are all mobile technologies used to obtain digital data from the interaction of people (Evans, 2010). The fact that they are mobile and move from location to location with the user differentiates the type of information available to the analytics analysts. For example, the mobile technology allows analysts to not only track what a potential customer might talk about on the use of a product (such as in social media analytics), but track movements of where the customer makes decisions on products. That can help explain why those decisions are made. For example, mobile technology might reveal the location of a purchaser of hair spray to have been physically located near an area where billboards are used for hair spray advertising, thus helping to reveal the possible connection and effectiveness of a billboard promotion. When data is placed in databases and can be logically filed, accessed, referenced, and used, it is known as structured data. When data or information, either digital or nondigital, cannot be put into a database or has no predefined structure, it is known as unstructured data. Examples of unstructured data include images, text, and other data that, for one reason or another, cannot be placed in a logically searchable database based on content. This data can be digitally stored in a file, but not in a way that can be usefully retrieved using any kind of logic model or sorting process. Much of the data contained in emails and on the Web is unstructured. Another way of looking at unstructured data is that it is what is left over and cannot be placed in a structured database. As time goes on, more effort in developing complex algorithms and other computer-based technologies will be applied to unstructured data, reducing the amount of data that falls into this category. Given the volume of graphics data or other unstructured data generated every day, the challenge to BA analysts will be an ever-growing effort to understand and structure the unstructured data that remains in an effort to gain its informational value. Part of the value and importance of BA is in accepting this challenge.

Summary This chapter sought to explain why business analytics is an important subject for business organizations. It discussed how BA can answer important questions and how it can help a firm achieve a competitive advantage. In addition, it presented the role of BA in organization planning. Finally, it introduced other types of digital analytics to explain their beneficial role and challenges to BA.

We move in the next chapter to further explain why BA is important in the context of its required investment. Like any management task, the successful use of BA requires an investment in human and technology resources. Chapter 3, “What Resource Considerations Are Important to Support Business Analytics?”, explores the allocation of resources to maximize BA performance and explains why the investment is needed.

Discussion Questions 1. Why does each step in the business analytics process have a past, present, and future dimension? 2. What is a competitive advantage, and how is it related to BA? 3. Why does having the ability to aid in decision-making make BA important? 4. How does BA help achieve sustainability? 5. What are digital analytics?

References Abrahams, A., Jiao, J., Wang, G., Fan, W. (2012). “Vehicle Defect Discovery from Social Media.” Decision Support Systems. Vol. 54, No. 1, pp. 87–97. Basole, R., Seuss, C., Rouse, W. (2013). “IT Innovation Adoption by Enterprises: Knowledge Discovery through Text Analytics.” Decision Support Systems. Vol. 54, No. 2, pp. 1044–1054. Demirkan, H., Delen, D. (2013). “Leveraging the Capabilities of ServiceOriented Decision Support Systems: Putting Analytics and Big Data in Cloud.” Decision Support Systems. Vol. 55, No. 1, pp. 412–421. Evans, B. (2010). “The Rise of Analytics and Fall of the Tactical CIO.” Informationweek. December 6, No. 1286, p. 14. Fitx-enz, J. (2013). “Predictive Analytics Applied to Human Resources.” In Isson, J.P., Harriott, J.S. (2013) Win with Advanced Business Analytics. John Wiley & Sons, Hoboken, NJ. Isson, J.P., Harriott, J.S. (2013). Win with Advanced Business Analytics. John Wiley & Sons, Hoboken, NJ. Kiron, D., Kirk-Prentice, P., Boucher-Ferguson, R. (2012). “Innovating with Analytics.” MIT Sloan Management Review. Vol. 54, No. 1, pp. 47–52. Rha, J. S. (2013). “Ambidextrous Supply Chain Management as a Dynamic Capability: Building a Resilient Supply Chain” (Doctoral Dissertation). Teece, D.J. (2007). “Explicating Dynamic Capabilities: The Nature and Microfoundations of (Sustainable) Enterprise Performance.” Strategic

Management Journal. Vol. 28, No. 13, pp. 1319–1350.

3. What Resource Considerations Are Important to Support Business Analytics? Chapter objectives: • Explain why personnel, data, and technology are needed in starting up a business analytics program. • Explain what skills business analytics personnel should possess and why. • Describe the job specialties that exist in business analytics. • Describe database encyclopedia content. • Explain the categorization of data in terms of sources. • Describe internal and external sources of data. • Describe an information technology infrastructure. • Describe a database management system and how it supports business analytics.

3.1. Introduction To fully understand why business analytics (BA) is necessary, one must understand the nature of the roles BA personnel perform. In addition, it is necessary to understand resource needs of a BA program to better comprehend the value of the information that BA provides. The need for BA resources varies by firm to meet particular decision support requirements. Some firms may choose to have a modest investment, whereas other firms may have BA teams or a department of BA specialists. Regardless of the level of resource investment, at minimum, a BA program requires resource investments in BA personnel, data, and technology.

3.2. Business Analytics Personnel One way to identify personnel needed for BA staff is to examine what is required for certification in BA by organizations that provide BA services. INFORMS (www.informs.org/Certification-Continuing-Ed/AnalyticsCertification), a major academic and professional organization, announced the startup of a Certified Analytic Professional (CAP) program in 2013. Another more established organization, Cognizure (www.cognizure.com/index.aspx), offers a variety of service products, including business analytic services. It offers a general certification Business Analytics Professional (BAP) exam that measures existing skill sets in BA staff and identifies areas needing improvement (www.cognizure.com/cert/bap.aspx). This is a tool to validate technical proficiency, expertise, and professional standards in BA. The certification consists of three exams covering the content areas listed in Table 3.1.

*Source: Adapted from Cognizure Organization website (www.cognizure.com/cert/bap.aspx). Table 3.1 Cognizure Organization Certification Exam Content Areas* Most of the content areas in Table 3.1 will be discussed and illustrated in subsequent chapters and appendixes. The three exams required in the Cognizure certification program can easily be understood in the context of the three steps of the BA process (descriptive, predictive, and prescriptive)

discussed in previous chapters. The topics in Figure 3.1 of the certification program are applicable to the three major steps in the BA process. The basic statistical tools apply to the descriptive analytics step, the more advanced statistical tools apply to the predictive analytics step, and the operations research tools apply to the prescriptive analytics step. Some of the tools can be applied to both the descriptive and the predictive steps. Likewise, tools like simulation can be applied to answer questions in both the predictive and the prescriptive steps, depending on how they’re used. At the conjunction of all the tools is the reality of case studies. The use of case studies is designed to provide practical experience where all tools are employed to answer important questions or seek opportunities.

Figure 3.1 Certification content areas and their relationship to the steps in BA Other organizations also offer specialized certification programs. These certifications include other areas of knowledge and skills beyond just analytic tools. IBM, for example, offers a variety of specialized BA certifications (www-03.ibm.com/certify/certs/ba_index.shtml). Although these include certifications in several dozen statistical, information systems,

and analytic methodologies related to BA, they also include specialized skill sets related to BA personnel (administrators, designers, developers, solution experts, and specialists), as presented in Table 3.2.

*Source: Adapted from IBM website (www03.ibm.com/certify/certs/ba_index.shtml). Table 3.2 Types of BA Personnel* With the variety of positions and roles participants play in the BA process, this leads to the question of what skill sets or competencies are needed to function in BA. In a general sense, BA positions require competencies in business, analytic, and information systems skills. As listed in Table 3.3, business skills involve basic management of people and processes. BA personnel must communicate with BA staffers within the organization (the BA team members) and the other functional areas within a firm (BA customers and users) to be useful. Because they serve a variety of functional areas within a firm, BA personnel need to possess customer service skills so they can interact with the firm’s personnel and understand the nature of the problems they seek to solve. BA personnel also need to sell their services to users inside the firm. In addition, some must lead a BA team or department, which requires considerable interpersonal management leadership skills and abilities.

Table 3.3 Select Types of BA Personnel Skills or Competency Requirements Fundamental to BA is an understanding of analytic methodologies listed in Table 3.1 and others not listed. In addition to any tool sets, there is a need to know how they are integrated into the BA process to leverage data (structured or unstructured) and obtain information that customers who will be guided by the analytics desire. In summary, people who undertake a career in BA are expected to know how to interact with people and utilize the necessary analytic tools to leverage data into useful information that can be processed, stored, and shared in information systems in a way that guides a firm to higher levels of business performance.

3.3. Business Analytics Data Structured and unstructured data (introduced in Chapter 2, “Why Are Business Analytics Important?”) is needed to generate analytics. As a beginning for organizing data into an understandable framework, statisticians usually categorize data into meaning groups.

3.3.1. Categorizing Data There are many ways to categorize business analytics data. Data is commonly categorized by either internal or external sources (Bartlett, 2013, pp. 238–239). Typical examples of internal data sources include those presented in Table 3.4. When firms try to solve internal production or service operations problems, internally sourced data may be all that is needed. Typical external sources of data (see Table 3.5) are numerous and provide great diversity and unique challenges for BA to process. Data can be measured quantitatively (for example, sales dollars) or qualitatively by preference surveys (for example, products compared based on consumers preferring one product over another) or by the amount of consumer discussion (chatter) on the Web regarding the pluses and minuses of competing products.

Table 3.4 Typical Internal Sources of Data on Which Business Analytics Can Be Based

Table 3.5 Typical External Sources of Data on Which Business Analytics Can Be Based A major portion of the external data sources are found in the literature. For example, the US Census and the International Monetary Fund (IMF) are useful data sources at the macroeconomic level for model building. Likewise, audience and survey data sources might include Nielsen (www.nielsen.com/us/en.html), psychographic or demographic data sourced from Claritas (www.claritas.com), financial data from Equifax (www.equifax.com), Dun & Bradstreet (www.dnb.com), and so forth.

3.3.2. Data Issues Regardless of the source of data, it has to be put into a structure that makes it usable by BA personnel. We will discuss data warehousing in the next section, but here we focus on a couple of data issues that are critical to the usability of any database or data file. Those issues are data quality and data privacy. Data quality can be defined as data that serves the purpose for which it is collected. It means different things for different applications, but there are some commonalities of high-quality data. These qualities usually include accurately representing reality, measuring what it is supposed to measure, being timeless, and having completeness. When data is of high quality, it helps ensure competitiveness, aids customer service, and improves profitability. When data is of poor quality, it can provide information that is contradictory, leading to misguided decision-making. For example, having missing data in files can prohibit some forms’ statistical modeling, and incorrect coding of information can completely render databases useless. Data quality requires effort on the part of data managers to cleanse data of erroneous information and repair or replace missing data. We will discuss some of these quality data measures in later chapters. Data privacy refers to the protection of shared data such that access is permitted only to those users for whom it is intended. It is a security issue that requires balancing the need to know with the risks of sharing too much. There are many risks in leaving unrestricted access to a company’s database. For example, competitors can steal a firm’s customers by accessing addresses. Data leaks on product quality failures can damage brand image, and customers can become distrustful of a firm that shares information given in confidence. To avoid these issues, a firm needs to abide by the current legislation regarding customer privacy and develop a program devoted to data privacy. Collecting and retrieving data and computing analytics requires the use of computers and information technology. A large part of what BA personnel do is related to managing information systems to collect, process, store, and retrieve data from various sources.

3.4. Business Analytics Technology Firms need an information technology (IT) infrastructure that supports personnel in the conduct of their daily business operations. The general requirements for such a system are stated in Table 3.6. These types of technology are elemental needs for business analytics operations.

Table 3.6 General Information Technology (IT) Infrastructure Of particular importance for BA is the data management technologies listed in Table 3.6. Database management systems (DBMS) is a data management technology software that permits firms to centralize data, manage it efficiently, and provide access to stored data by application programs. DBMS usually serves as an interface between application programs and the physical data files of structured data. DBMS makes the task of understanding where and how the data is actually stored more efficient. In addition, other DBMS systems can handle unstructured data. For example, object-oriented DBMS systems are able to store and retrieve unstructured data, like drawings, images, photographs, and voice data. These types of technology are necessary to handle the load of big data that most firms currently collect. DBMS includes capabilities and tools for organizing, managing, and accessing data in databases. Four of the more important capabilities are its data definition language, data dictionary, database encyclopedia, and data manipulation language. DBMS has a data definition capability to specify the structure of content in a database. This is used to create database tables and characteristics used in fields to identify content. These tables and characteristics are critical success factors for search efforts as the database

grows in size. These characteristics are documented in the data dictionary (an automated or manual file that stores the size, descriptions, format, and other properties needed to characterize data). The database encyclopedia is a table of contents listing a firm’s current data inventory and what data files can be built or purchased. The typical content of the database encyclopedia is presented in Table 3.7. Of particular importance for BA is the data manipulation language tools included in DMBS. These tools are used to search databases for specific information. An example is structure query language (SQL), which allows users to find specific data through a session of queries and responses in a database.

Table 3.7 Database Encyclopedia Content Data warehouses are databases that store current and historical data of potential interest to decision makers. What a data warehouse does is make data available to anyone who needs access to it. In a data warehouse, the data is prohibited from being altered. Data warehouses also provide a set of query tools, analytical tools, and graphical reporting facilities. Some firms use intranet portals to make data warehouse information widely available throughout a firm. Data marts are focused subsets or smaller groupings within a data warehouse. Firms often build enterprise-wide data warehouses where a central data warehouse serves the entire organization and smaller, decentralized data warehouses (called data marts) are focused on a limited portion of the organization’s data that is placed in a separate database for a specific population of users. For example, a firm might develop a smaller database on just product quality to focus efforts on quality customer and

product issues. A data mart can be constructed more quickly and at lower cost than enterprise-wide data warehouses to concentrate effort in areas of greatest concern. Once data has been captured and placed into database management systems, it is available for analysis with BA tools, including online analytical processing, as well as data, text, and Web mining technologies. Online analytical processing (OLAP) is software that allows users to view data in multiple dimensions. For example, employees can be viewed in terms of their age, sex, geographic location, and so on. OLAP would allow identification of the number of employees who are age 35, male, and in the western region of a country. OLAP allows users to obtain online answers to ad hoc questions quickly, even when the data is stored in very large databases. Data mining is the application of a software, discovery-driven process that provides insights into business data by finding hidden patterns and relationships in big data or large databases and inferring rules from them to predict future behavior. The observed patterns and rules are used to guide decision-making. They can also act to forecast the impact of those decisions. It is an ideal predictive analytics tool used in the BA process mentioned in Chapter 1, “What Are Business Analytics?” The kinds of information obtained by data mining include those in Table 3.8.

Table 3.8 Types of Information Obtainable with Data Mining Technology Text mining (mentioned in Chapter 2) is a software application used to extract key elements from unstructured data sets, discover patterns and relationships in the text materials, and summarize the information. Given that the majority of the information stored in businesses is in the form of unstructured data (emails, pictures, memos, transcripts, survey responses, business receipts, and so on), the need to explore and find useful information will require increased use of text mining tools in the future. Web mining seeks to find patterns, trends, and insights into customer behavior from users of the Web. Marketers, for example, use BA services like Google Trends (www.google.com/trends/) and Google Insights for Search (http://google.about.com/od/i/g/google-insights-for-search.htm) to track the popularity of various words and phrases to learn what consumers are interested in and what they are buying. In addition to the general software applications discussed earlier, there are focused software applications used every day by BA analysts in conducting the three steps of the BA process (see Chapter 1). These include Microsoft Excel® spreadsheet applications, SAS applications, and SPSS applications. Microsoft Excel (www.microsoft.com/) spreadsheet systems have add-in applications specifically used for BA analysis. These add-in applications

broaden the use of Excel into areas of BA. Analysis ToolPak is an Excel add-in that contains a variety of statistical tools (for example, graphics and multiple regression) for the descriptive and predictive BA process steps. Another Excel add-in, Solver, contains operations research optimization tools (for example, linear programming) used in the prescriptive step of the BA process. SAS® Analytics Pro (www.sas.com/) software provides a desktop statistical toolset allowing users to access, manipulate, analyze, and present information in visual formats. It permits users to access data from nearly any source and transform it into meaningful, usable information presented in visuals that allow decision makers to gain quick understanding of critical issues within the data. It is designed for use by analysts, researchers, statisticians, engineers, and scientists who need to explore, examine, and present data in an easily understandable way and distribute findings in a variety of formats. It is a statistical package chiefly useful in the descriptive and predictive steps of the BA process. IBM’s SPSS software (www-01.ibm.com/software/analytics/spss/) offers users a wide range of statistical and decision-making tools. These tools include methodologies for data collection, statistical manipulation, modeling trends in structured and unstructured data, and optimizing analytics. Depending on the statistical packages acquired, the software can cover all three steps in the BA process. Other software applications exist to cover the prescriptive step of the BA process. One that will be used in this book is LINGO® by Lindo Systems (www.lindo.com). LINGO is a comprehensive tool designed to make building and solving optimization models faster, easier, and more efficient. LINGO provides a completely integrated package that includes an understandable language for expressing optimization models, a full-featured environment for building and editing problems, and a set of built-in solvers to handle optimization modeling in linear, nonlinear, quadratic, stochastic, and integer programming models. In summary, the technology needed to support a BA program in any organization will entail a general information system architecture, including database management systems and progress in greater specificity down to the software that BA analysts need to compute their unique contributions to the organization. Organizations with greater BA requirements will have substantially more technology to support BA efforts, but all firms that seek to

use BA as a strategy for competitive advantage will need a substantial investment in technology, because BA is a technology-dependent undertaking.

Summary Why BA is important is directly proportional to what it costs. In this chapter, we have explored costs, but also many of the benefits of BA as a means to justify why a BA program is necessary. This chapter discussed what resources a firm would need to support a BA program. From this, three primary areas of resources were identified: personnel, data, and technology. Having identified BA personnel and needed skill sets, a review of content in BA certification exams was presented. Types of personnel specialties also were discussed. BA data internal and external sources were presented as a means of data categorization. Finally, BA technology was covered in terms of general, organization-wide information systems support to individual analyst support software packages. In this chapter, we focused on the investment in resources needed to have a viable business analytics operation. In Chapter 4, we begin Part III, “How Can Business Analytics Be Applied?” Specifically, in the next chapter we will focus on how the resources mentioned in this chapter are placed into an organization and managed to achieve goals.

Discussion Questions 1. How does using BA certification exam content explain skill sets for BA analysts? What skill sets are necessary for BA personnel? 2. Why is leadership an important skill set for individuals looking to make a career in BA? 3. Why is categorizing data from its sources important in BA? 4. What is data quality, and why is it important in BA? 5. What is the difference between a data warehouse and a datamart?

References Bartlett, R. (2013). A Practitioner’s Guide to Business Analytics. McGraw-Hill, New York, NY. Laursen, G. H. N., Thorlund, J. (2010). Business Analytics for Managers. John Wiley & Sons, Hoboken, NJ. Stubbs, E. (2013). Delivering Business Analytics. John Wiley & Sons, Hoboken, NJ. Stubbs, E. (2011). The Value of Business Analytics. John Wiley & Sons, Hoboken, NJ.

Part III: How Can Business Analytics Be Applied Chapter 4 How Do We Align Resources to Support Business Analytics within an Organization? Chapter 5 What Are Descriptive Analytics? Chapter 6 What Are Predictive Analytics? Chapter 7 What Are Prescriptive Analytics? Chapter 8 A Final Case Study Illustration

4. How Do We Align Resources to Support Business Analytics within an Organization? Chapter objectives: • Explain why a centralized business analytics (BA) organization structure has advantages over other structures. • Describe the differences between BA programs, projects, and teams and how they are used to align BA resources in firms. • Describe reasons why BA initiatives fail. • Describe typical BA team roles and reasons for their failures. • Explain why establishing an information policy is important. • Explain the advantages and disadvantages of outsourcing BA. • Describe how data can be scrubbed. • Explain what change management involves and what its relationship is to BA.

4.1. Organization Structures Aligning Business Analytics According to Isson and Harriott (2013, p. 124), to successfully implement business analytics (BA) within organizations, the BA in whatever organizational form it takes must be fully integrated throughout a firm. This requires BA resources to be aligned in a way that permits a view of customer information within and across all departments, access to customer information from multiple sources (internal and external to the organization), access to historical analytics from a central repository, and making technology resources align to be accountable for analytic success. The commonality of these requirements is the desire for an alignment that maximizes the flow of information into and through the BA operation, which in turn processes and shares information to desired users throughout the organization. Accomplishing this information flow objective requires consideration of differing organizational structures and managerial issues that help align BA resources to best serve an organization.

4.1.1. Organization Structures As mentioned in Chapter 2, “Why Are Business Analytics Important?”, most organizations are hierarchical, with senior managers making the strategic planning decisions, middle-level managers making tactical planning decisions, and lower-level managers making operational planning decisions. Within the hierarchy, other organizational structures exist to support the development and existence of groupings of resources like those needed for BA. These additional structures include programs, projects, and teams. A program in this context is the process that seeks to create an outcome and usually involves managing several related projects with the intention of improving organizational performance. A program can also be a large project. A project tends to deliver outcomes and can be defined as having temporary rather than permanent social systems within or across organizations to accomplish particular and clearly defined tasks, usually under time constraints. Projects are often composed of teams. A team consists of a group of people with skills to achieve a common purpose. Teams are especially appropriate for conducting complex tasks that have many interdependent subtasks. The relationship of programs, projects, and teams with a business hierarchy is presented in Figure 4.1. Within this hierarchy, the organization’s senior managers establish a BA program initiative to mandate the creation of a BA grouping within the firm as a strategic goal. A BA program does not always have an end-time limit. Middle-level managers reorganize or break down the strategic BA program goals into doable BA project initiatives to be undertaken in a fixed period of time. Some firms have only one project (establish a BA grouping) and others, depending on the organization structure, have multiple BA projects requiring the creation of multiple BA groupings. Projects usually have an end-time date in which to judge the successfulness of the project. The projects in some cases are further reorganized into smaller assignments, called BA team initiatives, to operationalize the broader strategy of the BA program. BA teams may have a long-standing time limit (for example, to exist as the main source of analytics for an entire organization) or have a fixed period (for example, to work on a specific product quality problem and then end).

Figure 4.1 Hierarchal relationships program, project, and team planning In summary, one way to look at the alignment of BA resources is to view it as a progression of assigned planning tasks from a BA program, to BA projects, and eventually to BA teams for implementation. As shown in Figure 4.1, this hierarchical relationship is a way to examine how firms align planning and decision-making workload to fit strategic needs and requirements. BA organization structures usually begin with an initiative that recognizes the need to use and develop some kind of program in analytics. Fortunately, most firms today recognize this need. The question then becomes how to match the firm’s needs within the organization to achieve its strategic, tactical, and operations objectives within resource limitations. Planning the BA resource allocation within the organizational structure of a firm is a starting place for the alignment of BA to best serve a firm’s needs. Aligning the BA resources requires a determination of the amount of resources a firm wants to invest. The outcome of the resource investment might identify only one individual to compute analytics for a firm. Because of the varied skill sets in information systems, statistics, and operations research methods, a more common beginning for a BA initiative is the creation of a BA team organization structure possessing a variety of analytical and management skills. (We will discuss BA teams in Section 4.1.2.) Another way of aligning BA resources within an organization is to use a project structure. Most firms undertake projects, and some firms actually use a project structure for their entire organization. For example, consulting firms might view each client as a project (or product) and align their resources around the particular needs of that client. A project structure often

necessitates multiple BA teams to deal with a wider variety of analytic needs. Even larger investments in BA resources might be required by firms that decide to establish a whole BA department containing all the BA resources for a particular organization. Although some firms create BA departments, the departments don’t have to be large. Whatever the organization structure that is used, the role of BA is a staff (not line management) role in their advisory and consulting mission for the firm. In general, there are different ways to structure an organization to align its BA resources to serve strategic plans. In organizations where functional departments are structured on a strict hierarchy, separate BA departments or teams have to be allocated to each functional area, as presented in Figure 4.2. This functional organization structure may have the benefit of stricter functional control by the VPs of an organization and greater efficiency in focusing on just the analytics within each specialized area. On the other hand, this structure does not promote the cross-department access that is suggested as a critical success factor for the implementation of a BA program.

Figure 4.2 Functional organization structure with BA The needs of each firm for BA sometimes dictate positioning BA within existing organization functional areas. Clearly, many alternative structures can house a BA grouping. For example, because BA provides information to users, BA could be included in the functional area of management information systems, with the chief information officer (CIO) acting as both the director of information systems (which includes database management) and the leader of the BA grouping. An alternative organizational structure commonly found in large organizations aligns resources by project or product and is called a matrix organization. As illustrated in Figure 4.3, this structure allows the VPs some indirect control over their related specialists, which would include the BA specialists but also allows direct control by the project or product manager.

This, similar to the functional organizational structure, does not promote the cross-department access suggested for a successful implementation of a BA program.

Figure 4.3 Matrix organization structure The literature suggests that the organizational structure that best aligns BA resources is one in which a department, project, or team is formed in a staff structure where access to and from the BA grouping of resources permits access to all areas within a firm, as illustrated in Figure 4.4 (Laursen and Thorlund, 2010, pp. 191–192; Bartlett, 2013, pp. 109–111; Stubbs, 2011, p. 68). The dashed line indicates a staff (not line management) relationship. This centralized BA organization structure minimizes investment costs by avoiding duplications found in both the functional and the matrix styles of organization structures. At the same time, it maximizes information flow between and across functional areas in the organization. This is a logical structure for a BA group in its advisory role to the organization. Bartlett (2013, pp. 109–110) suggests other advantages of a centralized structure like the one in Figure 4.4. These include a reduction in the filtering of information

traveling upward through the organization, insulation from political interests, breakdown of the siloed functional area communication barriers, a more central platform for reviewing important analyses that require a broader field of specialists, analytic-based group decision-making efforts, separation of the line management leadership from potential clients (for example, the VP of marketing would not necessarily come between the BA group working on customer service issues for a department within marketing), and better connectivity between BA and all personnel within the area of problem solving.

Figure 4.4 Centralized BA department, project, or team organization structure Given the advocacy and logic recommending a centralized BA grouping, there are reasons for all BA groupings to be centralized. These reasons help explain why BA initiatives that seek to integrate and align BA resources into any type of BA group within the organization sometimes fail. The listing in Table 4.1 is not exhaustive, but it provides some of the important issues to consider in the process of structuring a BA group.

Table 4.1 Reasons for BA Initiative and Organization Failure In summary, the organizational structure that a firm may select for the positioning of their BA grouping can either be aligned within an existing organizational structure, or the BA grouping can be separate, requiring full integration within all areas of an organization. While some firms may start with a number of small teams to begin their BA program, other firms may choose to start with a full-sized BA department. Regardless of the size of the investment in BA resources, it must be aligned to allow maximum information flow between and across functional areas to achieve the most benefits BA can deliver.

4.1.2. Teams When it comes to getting the BA job done, it tends to fall to a BA team. For firms that employ BA teams the participants can be defined by the roles they play in the team effort. Some of the roles BA team participants undertake and their typical background are presented in Table 4.2.

*Source: Adapted from Stubbs (2013), pp.137–149; Stubbs (2011) Table 3.3; Laursen and Thorlund (2010), p.15. Table 4.2 BA Team Participant Roles* Aligning BA teams to achieve their tasks requires collaboration efforts from team members and from their organizations. Like BA teams, collaboration involves working with people to achieve a shared and explicit set of goals consistent with their mission. BA teams also have a specific mission to complete. Collaboration through teamwork is the means to accomplish their mission. Team members’ need for collaboration is motivated by changes in the nature of work (no more silos to hide behind, much more open environment, and so on), growth in professions (for example, interactive jobs tend to be more professional, requiring greater variety in expertise sharing), and the need to nurture innovation (creativity and innovation are fostered by

collaboration with a variety of people sharing ideas). To keep one’s job and to progress in any business career, particularly in BA, team members must encourage working with other members inside a team and out. For organizations, collaboration is motivated by the changing nature of information flow (that is, hierarchical flows tend to be downward, whereas in modern organizations, flow is in all directions) and changes in the scope of business operations (that is, going from domestic to global allows for a greater flow of ideas and information from multiple sources in multiple locations). How does a firm change its culture of work and business operations to encourage collaboration? One way to affect the culture is to provide the technology to support a more open, cross-departmental information flow. This includes e-mail, instant messaging, wikis (collaboratively edited works, like Wikipedia), use of social media and networking through Facebook and Twitter, and encouragement of activities like collaborative writing, reviewing, and editing efforts. Other technology supporting collaboration includes webinars, audio and video conferencing, and even the use of iPads to enhance face-to-face communication. These can be tools that change the culture of a firm to be more open and communicative. Reward systems should be put into place to reward team effort. Teams should be rewarded for their performance, and individuals should be rewarded for performance in a team. While middle-level managers build teams, coordinate their work, and monitor their performance, senior management should establish collaboration and teamwork as a vital function. Despite the collaboration and best of intentions, BA teams sometimes fail. There are many reasons for this, but knowing some of the more common ones can help managers avoid them. Some of the more common reasons for team failure are presented in Table 4.3. They also represent issues that can cause a BA program to become unaligned and unproductive.

*Source: Adapted from Flynn (2008) pp. 99–106 and Stubbs (2011) p. 89. Table 4.3 Reasons for BA Team Failures*

4.2. Management Issues Aligning organizational resources is a management function. There are general management issues that are related to a BA program, and some are specifically important to operating a BA department, project, or team. The ones covered in this section include establishing an information policy, outsourcing business analytics, ensuring data quality, measuring business analytics contribution, and managing change.

4.2.1. Establishing an Information Policy There is a need to manage information. This is accomplished by establishing an information policy to structure rules on how information and data are to be organized and maintained and who is allowed to view the data or change it. The information policy specifies organizational rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying all types of information and data. It defines the specific procedures and accountabilities that identify which users and organizational units can share information, where the information can be distributed, and who is responsible for updating and maintaining the information. In small firms, business owners might establish the information policy. For larger firms, data administration may be responsible for the specific policies and procedures for data management (Siegel and Shim, 2003, p. 280). Responsibilities could include developing the information policy, planning data collection and storage, overseeing database design, developing the data dictionary, as well as monitoring how information systems specialists and end user groups use data. A more popular term for many of the activities of data administration is data governance, which includes establishing policies and processes for managing the availability, usability, integrity, and security of the data employed in businesses. It is specifically focused on promoting data privacy, data security, data quality, and compliance with government regulations. Such information policy, data administration, and data governance must be in place to guard and ensure data is managed for the betterment of the entire organization. These steps are also important in the creation of database management systems (see Chapter 3, “What Resource Considerations Are Important to Business Analytics?”) and their support of BA tasks.

4.2.2. Outsourcing Business Analytics Outsourcing can be defined as a strategy by which an organization chooses to allocate some business activities and responsibilities from an internal source to an external source (Schniederjans, et al., 2005, pp. 3–4). Outsourcing business operations is a strategy that an organization can use to implement a BA program, run BA projects, and operate BA teams. Any business activity can be outsourced, including BA. Outsourcing is an important BA management activity that should be considered as a viable alternative in planning an investment in any BA program.

BA is a staff function that is easier to outsource than other line management tasks, such as running a warehouse. To determine if outsourcing is a useful option in BA programs, management needs to balance the advantages of outsourcing with its disadvantages. Some of the advantages of outsourcing BA include those listed in Table 4.4.

Table 4.4 Advantages of Outsourcing BA Nevertheless, there are disadvantages of outsourcing BA. Some of the disadvantages to outsourcing are presented in Table 4.5.

Table 4.5 Disadvantages of Outsourcing BA

Managing outsourcing of BA does not have to involve the entire department. Most firms outsource projects or tasks found to be too costly to assign internally. For example, firms outsource cloud computing services to outside vendors (Laudon and Laudon, 2012, p. 511), and other firms outsource software development or maintenance of legacy programs to offshore firms in low-wage areas of the world to cut costs (Laudon and Laudon, 2012, p. 192). Outsourcing BA can also be used as a strategy to bring BA into an organization (Schniederjans, et al., 2005, pp. 24–27). Initially, to learn how to operate a BA program, project, or team, an outsource firm can be hired for a limited, contracted time period. The client firm can then learn from the outsourcing firm’s experience and instruction. Once the outsourcing contract is over, the client firm can form its own BA department, project, or team.

4.2.3. Ensuring Data Quality Business analytics, if relevant, is based on data assumed to be of high quality. Data quality refers to accuracy, precision, and completeness of data. High-quality data is considered to correctly reflect the real world in which it is extracted. Poor quality data caused by data entry errors, poorly maintained databases, out-of-date data, and incomplete data usually leads to bad decisions and undermines BA within a firm. Organizationally, the database management systems (DBMS, mentioned in Chapter 3) personnel are managerially responsible for ensuring data quality. Because of its importance and the possible location of the BA department outside of the management information systems department (which usually hosts the DBMS), it is imperative that whoever leads the BA program should seek to ensure data quality efforts are undertaken. Ideally, a properly designed database with organization-wide data standards and efforts taken to avoid duplication or inconsistent date elements should have high-quality data. Unfortunately, times are changing, and more organizations allow customers and suppliers to enter data into databases via the Web directly. As a result, most of the quality problems originate from data input such as misspelled names, transposed numbers, or incorrect or missing codes. An organization needs to identify and correct faulty data and establish routines and procedures for editing data in the database. The analysis of data quality can begin with a data quality audit, where a structured survey or inspection of accuracy and level of completeness of data is undertaken. This

audit may be of the entire database, just a sample of files, or a survey of end users for perceptions of the data quality. If during the data quality audit files are found that have errors, a process called data cleansing or data scrubbing is undertaken to eliminate or repair data. Some of the areas in a data file that should be inspected in the audit and suggestions on how to correct them are presented in Table 4.6.

Table 4.6 Quality Data Inspection Items and Recommendations

4.2.4. Measuring Business Analytics Contribution The investment in BA must continually be justified by communicating the BA contribution to the organization for ongoing projects. This means that performance analytics should be computed for every BA project and BA team initiative. These analytics should provide an estimate of the tangible and intangible values being delivered to the organization. This should also involve establishing a communication strategy to promote the value being estimated. Measuring the value and contributions that BA brings to an organization is essential to helping the firm understand why the application of BA is worth the investment. Some BA contribution estimates can be computed using standard financial methods, such as payback period (how long it takes for the initial costs are returned by profit) or return on investment (ROI) (see Schniederjans, et al., 2010, pp. 90–132), where dollar values or quantitative analysis is possible. When intangible contributions are a major part of the contribution being delivered to the firm, other methods like cost/benefit

analysis (see Schniederjans, et al., 2010, pp. 143–158), which include intangible benefits, should be used. The continued measurement of value that BA brings to a firm is not meant to be self-serving, but it aids the organization in aligning efforts to solve problems and find new business opportunities. By continually running BA initiatives, a firm is more likely to identify internal activities that should and can be enhanced by employing optimization methodologies during the Prescriptive step of the BA process introduced in Chapter 1, “What Are Business Analytics?” It can also help identify underperforming assets. In addition, keeping track of investment payoffs for BA initiatives can identify areas in the organization that should have a higher priority for analysis. Indeed, past applications and allocations of BA resources that have shown significant contributions can justify priorities established by the BA leadership about where there should be allocated analysis efforts within the firm. They can also help acquire increases in data support, staff hiring, and further investments in BA technology.

4.2.5. Managing Change Wells (2000) found that what is critical in changing organizations is organizational culture and the use of change management. Organizational culture is how an organization supports cooperation, coordination, and empowerment of employees (Schermerhorn 2001, p. 38). Change management is defined as an approach for transitioning the organization (individuals, teams, projects, departments) to a changed and desired future state (Laudon and Laudon, 2012, pp. 540–542). Change management is a means of implementing change in an organization, such as adding a BA department (Schermerhorn 2001, pp. 382–390). Changes in an organization can be either planned (a result of specific and planned efforts at change with direction by a change leader) or unplanned (spontaneous changes without direction of a change leader). The application of BA invariably will result in both types of changes because of BA’s specific problem-solving role (a desired, planned change to solve a problem) and opportunity finding exploratory nature (i.e., unplanned new knowledge opportunity changes) of BA. Change management can also target almost everything that makes up an organization (see Table 4.7).

*Source: Adapted from Figure 7 in Schniederjans and Cao (2002), pp. 261. Table 4.7 Change Management Targets* It is not possible to gain the benefits of BA without change. The intent is change that involves finding new and unique information on which change should take place in people, technology systems, or business conduct. By instituting the concept of change management within an organization, a firm can align resources and processes to more readily accept changes that BA may suggest. Instituting the concept of change management in any firm depends on the unique characteristics of that firm. There are, though, a number of activities in common with successful change management programs, and they apply equally to changes in BA departments, projects, or teams. Some of these activities that lead to change management success are presented as best practices in Table 4.8.

Table 4.8 Change Management Best Practices

Summary Structuring a BA department, undertaking a BA project, or setting up a BA team within an organization can largely determine successfulness in aligning resources to achieve information-sharing goals. In this chapter, several organization structures (functional, matrix, and centralized) were discussed as possible homes for BA resource groupings. The role of BA teams as an important organizational resource aligning tool was also presented. In addition, this chapter discussed reasons for BA organization and team failures. Other managerial issues included in this chapter were establishing an information policy, outsourcing business analytics, ensuring data quality, measuring business analytics contribution, and managing change. Once a firm has set up the internal organization for a BA department, program, or project, the next step is to undertake BA. In the next chapter, we begin the first of the three chapters devoted to detailing how to undertake the three steps of the BA process.

Discussion Questions 1. The literature in management information systems consistently suggests that a decentralized approach to resource allocation is the most efficient. Why then do you think the literature in BA suggests that the opposite—a centralized organization—is the best structure? 2. Why is collaboration important to BA? 3. Why is organization culture important to BA? 4. How does establishing an information policy affect BA? 5. Under what circumstances is outsourcing BA good for the development of BA in an organization? 6. Why do we have to measure BA contributions to an organization? 7. How does data quality affect BA? 8. What role does change management play in BA?

References Bartlett, R. (2013). A Practitioner’s Guide to Business Analytics. McGraw-Hill, New York, NY. Flynn, A. E. (2008). Leadership in Supply Management. Institute for Supply Management, Inc., Tempe, AZ. Isson, J. P., Harriott, J.S. (2013). Win with Advanced Business Analytics. John Wiley & Sons, Hoboken, NJ. Laursen, G. H. N., Thorlund, J. (2010). Business Analytics for Managers. John Wiley & Sons, Hoboken, NJ. Schermerhorn, J. R. (2001). Management, 6th ed., John Wiley and Sons, New York, NY. Schniederjans, M. J., Schniederjans, A. M., Schniederjans, D. G. (2005). Outsourcing and Insourcing in an International Context. M. E. Sharpe, Armonk, NY. Siegel, J., Shim, J. (2003). Database Management Systems. Thomson/South-Western, Mason, OH. Stubbs, E. (2013). Delivering Business Analytics. John Wiley & Sons, Hoboken, NJ. Stubbs, E. (2011). The Value of Business Analytics. John Wiley & Sons, Hoboken, NJ. Wells, M.G. (2000). “Business Process Re-Engineering Implementations Using Internet Technology.” Business Process Management Journal, Vol. 6, No. 2, 2000, pp. 164–184.

5. What Are Descriptive Analytics? Chapter objectives: • Explain why we need to visualize and explore data. • Describe statistical charts and how to apply them. • Describe descriptive statistics useful in the descriptive business analytics (BA) process. • Describe the differences in SPSS descriptive software printouts from those covering the comparable subject in Excel. • Describe sampling methods useful in BA and where to apply them. • Describe what sampling estimation is and how it can aid in the BA process. • Describe the use of confidence intervals and probability distributions. • Explain how to undertake the descriptive analytics step in the BA process.

5.1. Introduction In any BA undertaking, referred to as BA initiatives or projects, a set of objectives is articulated. These objectives are a means to align the BA activities to support strategic goals. The objectives might be to seek out and find new business opportunities, to solve operational problems the firm is experiencing, or to grow the organization. It is from the objectives that exploration via BA originates and is in part guided. The directives that come down, from the strategic planners in an organization to the BA department or analyst, focus the tactical effort of the BA initiative or project. Maybe the assignment will be one of exploring internal marketing data for a new marketing product. Maybe the BA assignment will be focused on enhancing service quality by collecting engineering and customer service information. Regardless of the type of BA assignment, the first step is one of exploring data and revealing new, unique, and relevant information to help the organization advance its goals. Doing this requires an exploration of data. This chapter focuses on how to undertake the first step in the BA process: descriptive analytics. The focus in this chapter is to acquaint readers with more common descriptive analytic tools used in this step and available in SPSS and Excel software. The treatment here is not computational but informational regarding the use and meanings of these analytic tools in support of BA. For purposes of illustration, we will use the data set in Figure

5.1 representing four different types of product sales (Sales 1, Sales 2, Sales 3, and Sales 4).

Figure 5.1 Illustrative sales data sets

5.2. Visualizing and Exploring Data There is no single best way to explore a data set, but some way of conceptualizing what the data set looks like is needed for this step of the BA process. Charting is often employed to visualize what the data might reveal. When determining the software options to generate charts in SPSS or Excel, consider that each software can draft a variety of charts for the selected variables in the data sets. Using the data in Table 5.1, charts can be

created for the illustrative sales data sets. Some of these charts are discussed in Table 5.1 as a set of exploratory tools that are helpful in understanding the informational value of data sets. The chart to select depends on the objectives set for the chart.

The charts presented in Table 5.1 reveal interesting facts. The area chart is able to clearly contrast the magnitude of the values in the two variable data sets (Sales 1 and Sales 4). The column chart is useful in revealing the almost perfect linear trend in the Sales 3 data, whereas the scatter chart reveals an almost perfect nonlinear function in Sales 4 data. Additionally, the cluttered pie chart with 20 different percentages illustrates that all charts can or should be used in some situations. The best practices suggest charting should be viewed as an exploratory activity of BA. BA analysts should run a variety of charts and see which ones reveal interesting and useful information. Those charts can be further refined to drill down to more detailed information and more appropriate charts related to the objectives of the BA initiative. Of course, a cursory review of the Sales 4 data in Figure 5.1 makes the concave appearance of the data in the scatter chart in Table 5.1 unnecessary. But most BA problems involve big data—so large as to make it impossible to just view it and make judgment calls on structure or appearance. This is why descriptive statistics can be employed to view the data in a parameterbased way in the hopes of better understanding the information that the data has to reveal.

5.3. Descriptive Statistics When selecting the option of descriptive statistics in SPSS or Excel, a number of useful statistics are automatically computed for the variables in the data sets. Some of these descriptive statistics are discussed in Table 5.2 as exploratory tools that are helpful in understanding the informational value of data sets.

Table 5.1 Statistical Charts Useful in BA

Fortunately, we do not need to compute these statistics to know how to use them. Computer software provides these descriptive statistics where they’re needed or requested. The SPSS descriptive statistics for the illustrative sales data sets are presented in Table 5.3, and the Excel descriptive statistics are presented in Table 5.4.

Table 5.3 SPSS Descriptive Statistics

Table 5.4 Excel Descriptive Statistics Looking at the data sets for the four variables in Figure 5.1 and at the statistics in Tables 5.3 and 5.4, there are some obvious conclusions based on the detailed statistics from the data sets. It should be no surprise that Sales 2, with a few of the largest values and mostly smaller ones making up the data set, would have the largest variance statistics (standard deviation, sample variance, range, maximum/minimum). Also, Sales 2 is highly, positively skewed (Skewedness > 1) and highly peaked (Kurtosis >3). Note the similarity of the mean, median, and mode in Sales 1 and the dissimilarity in Sales 2. These descriptive statistics provide a more precise basis to envision the behavior of the data. Referred to as measures of central tendency, the mean, median, and mode can also be used to clearly define the direction of a skewed distribution. A negatively skewed distribution orders these measures

such that mean Classify > K-Means Cluster Analysis. Any integer value can designate the K number of clusters desired. In this problem set, K=2. The SPSS printout of this classification process is shown in Table 6.3. The solution is referred to as a Quick Cluster because it initially selects the first two high and low values. The Initial Cluster Centers table listed the initial high (20167) and a low (12369) value from the data set as the clustering process begins. As it turns out, the software divided the customers into nine high sales customers with a group mean sales of 18,309 and eleven low sales customers with a group mean sales of 14,503.

Table 6.3 SPSS K-Mean Cluster Solution Consider how large big data sets can be. Then realize this kind of classification capability can be a useful tool for identifying and predicting sales based on the mean values. There are so many BA methodologies that no single section, chapter, or even book can explain or contain them all. The analytic treatment and computer usage in this chapter have been focused mainly on conceptual use. For a more applied use of some of these methodologies, note the case study that follows and some of the content in the appendixes.

6.4. Continuation of Marketing/Planning Case Study Example: Prescriptive Analytics Step in the BA Process In the last sections of Chapters 5, 6, and 7, an ongoing marketing/planning case study of the relevant BA step discussed in those chapters is presented to illustrate some of the tools and strategies used in a BA problem analysis. This is the second installment of the case study dealing with the predictive analytics analysis step in BA. The prescriptive analysis step coming in Chapter 7, “What Are Prescriptive Analytics?” will complete the ongoing case study.

6.4.1. Case Study Background Review The case study firm had collected a random sample of monthly sales information presented in Figure 6.4 listed in thousands of dollars. What the firm wants to know is, given a fixed budget of $350,000 for promoting this service product, when offered again, how best should the company allocate budget dollars in hopes of maximizing the future estimated month’s product sales? Before making any allocation of budget, there is a need to understand how to estimate future product sales. This requires understanding the behavior of product sales relative to sales promotion efforts using radio, paper, TV, and point-of-sale (POS) ads.

Figure 6.4 Data for marketing/planning case study The previous descriptive analytics analysis in Chapter 5 revealed a potentially strong relationship between radio and TV commercials that might be useful in predicting future product sales. The analysis also revealed little regarding the relationship of newspaper and POS ads to product sales. So although radio and TV commercials are most promising, a more in-depth predictive analytics analysis is called for to accurately measure and document the degree of relationship that may exist in the variables to determine the best predictors of product sales.

6.4.2. Predictive Analytics Analysis An ideal multiple variable modeling approach that can be used in this situation to explore variable importance in this case study and eventually lead to the development of a predictive model for product sales is correlation and multiple regression. We will use both Excel and IBM’s SPSS statistical packages to compute the statistics in this step of the BA process. First, we must consider the four independent variables—radio, TV, newspaper, POS—before developing the model. One way to see the statistical direction of the relationship (which is better than just comparing graphic charts) is to compute the Pearson correlation coefficients r between each of the independent variables with the dependent variable (product sales). The SPSS correlation coefficients and their levels of significance are presented in Table 6.4. The comparable Excel correlations are presented in Figure 6.5. Note: They do not include the level of significance but provide correlations between all the variables being considered. The larger the Pearson correlation (regardless of the sign) and the smaller the Significance test values (these are t-tests measuring the significance of the Pearson r value; see Appendix A), the more significant the relationship. Both radio and TV are statistically significant correlations, whereas at a 0.05 level of significance, paper and POS are not statistically significant.

Table 6.4 SPSS Pearson Correlation Coefficients: Marketing/Planning Case Study

Figure 6.5 Excel Pearson correlation coefficients: marketing/planning case study Although it can be argued that the positive or negative correlation coefficients should not automatically discount any variable from what will be a predictive model, the negative correlation of newspapers suggests that as a firm increases investment in newspaper ads, it will decrease product sales. This does not make sense in this case study. Given the illogic of such a relationship, its potential use as an independent variable in a model is questionable. Also, this negative correlation poses several questions that should be considered. Was the data set correctly collected? Is the data set accurate? Was the sample large enough to have included enough data for this variable to show a positive relationship? Should it be included for further analysis? Although it is possible that a negative relationship can statistically show up like this, it does not make sense in this case. Based on this reasoning and the fact that the correlation is not statistically significant, this variable (i.e., newspaper ads) will be removed from further consideration in this exploratory analysis to develop a predictive model. Some researchers might also exclude POS based on the insignificance (p=0.479) of its relationship with product sales. However, for purposes of illustration, continue to consider it a candidate for model inclusion. Also, the other two independent variables (radio and TV) were both found to be significantly related to product sales, as reflected in the correlation coefficients in the tables.

At this point, there is a dependent variable (product sales) and three candidate independent variables (POS, TV, and Radio) in which to establish a predictive model that can show the relationship between product sales and those independent variables. Just as a line chart was employed to reveal the behavior of product sales and the other variables in the descriptive analytic step, a statistical method can establish a linear model that combines the three predictive variables. We will use multiple regression, which can incorporate any of the multiple independent variables, to establish a relational model for product sales in this case study. Multiple regression also can be used to continue our exploration of the candidacy of the three independent variables. The procedure by which multiple regression can be used to evaluate which independent variables are best to include or exclude in a linear model is called step-wise multiple regression. It is based on an evaluation of regression models and their validation statistics—specifically, the multiple correlation coefficients and the F-ratio from an ANOVA. SPSS software and many other statistical systems build in the step-wise process. Some are called backward step-wise regression and some are called forward stepwise regression. The backward step-wise regression starts with all the independent variables placed in the model, and the step-wise process removes them one at a time based on worst predictors first until a statistically significant model emerges. The forward step-wise regression starts with the best related variable (using correction analysis as a guide), and then step-wise adds other variables until adding more will no longer improve the accuracy of the model. The forward step-wise regression process will be illustrated here manually. The first step is to generate individual regression models and statistics for each independent variable with the dependent variable one at a time. These three models are presented in Tables 6.5, 6.6, and 6.7 for the POS, radio, and TV variables, respectively. The comparable Excel regression statistics are presented in Tables 6.8, 6.9 and 6.10 for the POS, radio, and TV variables, respectively.

Table 6.5 SPSS POS Regression Model: Marketing/Planning Case Study

Table 6.6 SPSS Radio Regression Model: Marketing/Planning Case Study

Table 6.7 SPSS TV Regression Model: Marketing/Planning Case Study

Table 6.8 Excel POS Regression Model: Marketing/Planning Case Study

Table 6.9 Excel Radio Regression Model: Marketing/Planning Case Study

Table 6.10 Excel TV Regression Model: Marketing/Planning Case Study The computer printouts in the tables provide a variety of statistics for comparative purposes. Discussion will be limited here to just a few. The RSquare statistics are a precise proportional measure of the variation that is explained by the independent variable’s behavior with the dependent variable. The closer the R-Square to 1.00, the more of the variation is

explained, and the better the predictive variable. The three variables’ RSquares are 0.000 (POS), 0.955 (radio), and 0.918 (TV). Clearly, radio is the best predictor variable of the three, followed by TV and, without almost any relationship, POS. This latter result was expected based on the prior Pearson correlation. What it is suggesting is that only 0.082 percent (1.000– 0.918) of the variation in product sales is explained by TV commercials. From ANOVA, the F-ratio statistic is useful in actually comparing the regression model’s capability to predict the dependent variable. As RSquare increases, so does the F-ratio because of the way in which they are computed and what is measured by both. The larger the F-ratio (like the RSquare statistic), the greater the statistical significance in explaining the variable’s relationships. The three variables’ F-ratios from the ANOVA tables are 0.003 (POS), 380.220 (radio), and 200.731 (TV). Both radio and TV are statistically significant, but POS has an insignificant relationship. To give some idea of how significant the relationships are, assuming a level of significance where α=0.01, one would only need a cut-off value for the Fratio of 8.10 to designate it as being significant. Not exceeding that F-ratio (as in the case of POS at 0.003) is the same as saying that the coefficient in the regression model for POS is no different from a value of zero (no contribution to Product Sales). Clearly, the independent variables radio and TV appear to have strong relationships with the dependent variable. The question is whether the two combined or even three variables might provide a more accurate forecasting model than just using the one best variable like radio. Continuing with the step-wise multiple regression procedure, we next determine the possible combinations of variables to see if a particular combination is better than the single variable models computed previously. To measure this, we have to determine the possible combinations for the variables and compute their regression models. The combinations are (1) POS and radio, (2) POS and TV, (3) POS, radio, and TV, and (4) radio and TV. The resulting regression model statistics are summarized and presented in Table 6.11. If one is to base the selection decision solely on the R-Square statistic, there is a tie between the POS/radio/TV and the radio/TV combination (0.979 R-Square values). If the decision is based solely on the F-ratio value from ANOVA, one would select just the radio/TV combination, which one might expect of the two most significantly correlated variables.

To aid in supporting a final decision and to ensure these analytics are the best possible estimates, an additional statistic can be considered. That tie breaker is the R-Squared (Adjusted) statistic, which is commonly used in multiple regression models.

Table 6.11 SPSS Variable Combinations and Regression Model Statistics: Marketing/Planning Case Study The R-Square Adjusted statistic does not have the same interpretation as R-Square (a precise, proportional measure of variation in the relationship). It is instead a comparative measure of suitability of alternative independent variables. It is ideal for selection between independent variables in a multiple regression model. The R-Square adjusted seeks to take into account the phenomenon of the R-Square automatically increasing when additional independent variables are added to the model. This phenomenon is like a painter putting paint on a canvas, where more paint additively increases the value of the painting. Yet by continually adding paint, there comes a point at which some paint covers other paint, diminishing the value of the original. Similarly, statistically adding more variables should increase the ability of the model to capture what it seeks to model. On the other hand, putting in too many variables, some of which may be poor predictors, might bring down the total predictive ability of the model. The R-Square adjusted statistic provides some information to aid in revealing this behavior. The value of the R-Square adjusted statistic can be negative, but it will always be less than or equal to that of the R-Square in which it is related. Unlike R-Square, the R-Square adjusted increases when a new independent variable is included only if the new variable improves the R-Square more than would be expected in the absence of any independent value being added. If a set of independent variables is introduced into a regression model one at a time in forward step-wise regression using the highest correlations ordered first, the R-Square adjusted statistic will end up being equal to or less than the R-Square value of the original model. By systematic experimentation with the R-Square adjusted recomputed for each added variable or

combination, the value of the R-Square adjusted will reach a maximum and then decrease. The multiple regression model with the largest R-Square adjusted statistic will be the most accurate combination of having the best fit without excessive or unnecessary independent variables. Again, just putting all the variables into a model may add unneeded variability, which can decrease its accuracy. Thinning out the variables is important. Finally, in the step-wise multiple regression procedure, a final decision on the variables to be included in the model is needed. Basing the decision on the R-Square adjusted, the best combination is radio/TV. The SPSS multiple regression model and support statistics are presented in Table 6.12, and the Excel model is shown in Table 6.13.

Table 6.12 SPSS Best Variable Combination Regression Model and Statistics: Marketing/Planning Case Study

Table 6.13 Excel Best Variable Combination Regression Model and Statistics: Marketing/Planning Case Study Although there are many other additional analyses that could be performed to validate this model, we will use the SPSS multiple regression model in Table 6.12 for the firm in this case study. The forecasting model can be expressed as follows: Yp = –17150.455 + 275.691 X1 + 48.341 X2 where: Yp = the estimated number of dollars of product sales X1 = the number of dollars to invest in radio commercials X2 = the number of dollars to invest in TV commercials Because all the data used in the model is expressed as dollars, the interpretation of the model is made easier than using more complex data. The interpretation of the multiple regression model suggests that for every dollar allocated to radio commercials (represented by X1), the firm will receive $275.69 in product sales (represented by Yp in the model). Likewise, for every dollar allocated to TV commercials (represented by X2), the firm will receive $48.34 in product sales. A caution should be mentioned on the results of this case study. Many factors might challenge a result, particularly those derived from using

powerful and complex methodologies like multiple regression. As such, the results may not occur as estimated, because the model is not reflecting past performance. What is being suggested here is that more analysis can always be performed in questionable situations. Also, additional analysis to confirm a result should be undertaken to strengthen the trust that others must have in the results to achieve the predicted higher levels of business performance. In summary, for this case study, the predictive analytics analysis has revealed a more detailed, quantifiable relationship between the generation of product sales and the sources of promotion that best predict sales. The best way to allocate the $350,000 budget to maximize product sales might involve placing the entire budget into radio commercials because they give the best return per dollar of budget. Unfortunately, there are constraints and limitations regarding what can be allocated to the different types of promotional methods. Optimizing the allocation of a resource and maximizing business performance necessitate the use of special business analytic methods designed to accomplish this task. This requires the additional step of prescriptive analytics analysis in the BA process, which will be presented in the last section of Chapter 7.

Summary This chapter dealt with the predictive analytics step in the BA process. Specifically, it discussed logic-driven models based on experience and aided by methodologies like the cause-and-effect and the influence diagrams. This chapter also defined data-driven models useful in the predictive step of the BA analysis. A further discussion of data mining was presented. Data mining methodology such as neural networks, discriminant analysis, logistic regression, and hierarchical clustering was described. An illustration of Kmean clustering using Excel was presented. Finally, this chapter discussed the second installment of a case study illustrating the predictive analytics step of the BA process. The remaining installment of the case study will be presented in Chapter 7. Once again, several of this book’s appendixes are designed to augment the chapter material by including technical, mathematical, and statistical tools. For both a greater understanding of the methodologies discussed in this chapter and a basic review of statistical and other quantitative methods, a review of the appendixes is recommended. As previously stated, the goal of using predictive analytics is to generate a forecast or path for future improved business performance. Given this

predicted path, the question now is how to exploit it as fully as possible. The purpose of the prescriptive analytics step in the BA process is to serve as a guide to fully maximize the outcome in using the information provided by the predictive analytics step. The subject of Chapter 7 is the prescriptive analytics step in the BA process.

Discussion Questions 1. Why is predictive analytics analysis the next logical step in any business analytics (BA) process? 2. Why would one use logic-driven models to aid in developing datadriven models? 3. How are neural networks helpful in determining both associations and classification tasks required in some BA analyses? 4. Why is establishing clusters important in BA? 5. Why is establishing associations important in BA? 6. How can F-tests from the ANOVA be useful in BA?

Problems 1. Using the equation developed in this chapter for predicting dollar product sales (note below), what is the forecast for dollar product sales if the firm could invest $70,000 in radio commercials and $250,000 in TV commercials? Yp = -17150.455 + 275.691 X1 + 48.341 X2 where: Yp = the estimated number of dollars of product sales X1 = the number of dollars to invest in radio commercials X2 = the number of dollars to invest in TV commercials 2. Using the same formula as in Question 1, but now using an investment of $100,000 in radio commercials and $300,000 in TV commercials, what is the prediction on dollar product sales? 3. Assume for this problem the following table would have held true for the resulting marketing/planning case study problem. Which combination of variables is estimated here to be the best predictor set? Explain why.

4. Assume for this problem that the following table would have held true for the resulting marketing/planning case study problem. Which of the variables is estimated here to be the best predictor? Explain why.

5. Given the coefficients table that follows, what is the resulting regression model for TV and product sales? Is TV a good predictor of product sales according to this SPSS printout? Explain.

References Abrahams, A.S., Jiao, J., Fan, W., Wang, G., Zhang, Z. (2013). “What’s Buzzing in the Blizzard of Buzz? Automotive Component Isolation in Social Media Postings.” Decision Support Systems. Vol. 55, No. 4, pp. 871–882. Evans, J.R. (2013). Business Analytics. Pearson Education, Upper Saddle River, NJ. Nisbet, R., Elder, J., Miner, G. (2009). Handbook of Statistical Analysis & Data Mining Applications, Academic Press, Burlington, MA. Schniederjans, M.J., Cao, Q., Triche, J.H. (2014). E-Commerce Operations Management, 2nd ed. World Scientific, Singapore.

7. What Are Prescriptive Analytics? Chapter objectives: • List and describe the commonly used prescriptive analytics in the business analytics (BA) process. • Explain the role of case studies in prescriptive analytics. • Explain how curve fitting can be used in prescriptive analytics. • Explain how to formulate a linear programming model. • Explain the value of linear programming in the prescriptive analytics step of BA.

7.1. Introduction After undertaking the descriptive and predictive analytics steps in the BA process, one should be positioned to undertake the final step: prescriptive analytics analysis. The prior analysis should provide a forecast or prediction of what future trends in the business may hold. For example, there may be significant statistical measures of increased (or decreased) sales, profitability trends accurately measured in dollars for new market opportunities, or measured cost savings from a future joint venture. If a firm knows where the future lies by forecasting trends, it can best plan to take advantage of possible opportunities that the trends may offer. Step 3 of the BA process, prescriptive analytics, involves the application of decision science, management science, or operations research methodologies to make best use of allocable resources. These are mathematically based methodologies and algorithms designed to take variables and other parameters into a quantitative framework and generate an optimal or nearoptimal solution to complex problems. These methodologies can be used to optimally allocate a firm’s limited resources to take best advantage of the opportunities it has found in the predicted future trends. Limits on human, technology, and financial resources prevent any firm from going after all the opportunities. Using prescriptive analytics allows the firm to allocate limited resources to optimally or near-optimally achieve the objectives as fully as possible. In Chapter 3, “What Resource Considerations Are Important to Support Business Analytics?” the relationships of methodologies to the BA process were expressed as a function of certification exam content. The listing of the prescriptive analytic methodologies as they are in some cases utilized in the

BA process is again presented in Figure 7.1 to form the basis of this chapter’s content.

Figure 7.1 Prescriptive analytic methodologies

7.2. Prescriptive Modeling The listing of prescriptive analytic methods and models in Figure 7.1 is but a small grouping of many operations research, decision science, and management science methodologies that are applied in this step of the BA process. The explanation and use of most of the methodologies in Table 7.1 are explained throughout this book. (See Additional Information column in Table 7.1.)

7.3. Nonlinear Optimization The prescriptive methodologies in Table 7.1 are explained in detail in the referenced chapters and appendixes, but nonlinear optimization will be discussed here. When business performance cost or profit functions become too complex for simple linear models to be useful, exploration of nonlinear functions is a standard practice in BA. Although the predictive nature of exploring for a mathematical expression to denote a trend or establish a forecast falls mainly in the predictive analytics step of BA, the use of the nonlinear function to optimize a decision can fall in the prescriptive analytics step. As mentioned previously, there are many mathematical programing nonlinear methodologies and solution procedures designed to generate optimal business performance solutions. Most of them require careful estimation of parameters that may or may not be accurate, particularly given the precision required of a solution that can be so precariously dependent upon parameter accuracy. This precision is further complicated in BA by the large data files that should be factored into the model-building effort. To overcome these limitations and be more inclusive in the use of large data, regression software can be applied. As illustrated in Appendix E, Curve Fitting software can be used to generate predictive analytic models that can also be utilized to aid in making prescriptive analytic decisions. For purposes of illustration, SPSS’s Curve Fitting software will be used in this chapter. Suppose that a resource allocation decision is being faced whereby one must decide how many computer servers a service facility should purchase to optimize the firm’s costs of running the facility. The firm’s predictive analytics effort has shown a growth trend. A new facility is called for if costs can be minimized. The firm has a history of setting up large and small service facilities and has collected the 20 data points in Figure 7.2. Whether there are 20 or 20,000 items in the data file, this SPSS function fits the data based on regression mathematics to a nonlinear line that best minimizes the distance from the Prescriptive data items to Analytic the line. The software then Table 7.1 Select Models converts the line into a mathematical expression useful for forecasting.

Figure 7.2 Data and SPSS Curve Fitting function selection window In this server problem, the basic data has a u-shaped function, as presented in Figure 7.3. This is a classic shape for most cost functions in business. In this problem, it represents the balancing of having too few servers (resulting in a costly loss of customer business through dissatisfaction and complaints with the service) or too many servers (excessive waste in investment costs as a result of underutilized servers). Although this is an overly simplified example with little and nicely ordered data for clarity purposes, in big data situations, cost functions are considerably less obvious.

Figure 7.3 Server problem basic data cost function The first step in using the curve-fitting methodology is to generate the bestfitting curve to the data. By selecting all the SPSS models in Figure 7.2, the software applies each point of data using the regression process of minimizing distance from a line. The result is a series of regression models and statistics, including ANOVA and other testing statistics. It is known from the previous illustration of regression that the adjusted R-Square statistic can reveal the best estimated relationship between the independent (number of servers) and dependent (total cost) variables. These statistics are presented in Table 7.2. The best adjusted R-Square value (the largest) occurs with the quadratic model, followed by the cubic model. The more detailed supporting statistics for both of these models are presented in Table 7.3. The graph for all the SPSS curve-fitting models appears in Figure 7.4.

Table 7.2 Adjusted R-Square Values of All SPSS Models

Table 7.3 Quadratic and Cubic Model SPSS Statistics

Figure 7.4 Graph of all SPSS curve-fitting models From Table 7.3, the resulting two statistically significant curve-fitted models follow: Yp = 35417.772 − 5589.432 X + 268.445 X2 [Quadratic model] Yp = 36133.696 − 5954.738 X + 310.895 X2 − 1.347 X3 [Cubic model] where: Yp = the forecasted or predicted total cost, and X = can be the number of computer servers. For purposes of illustration, we will use the quadratic model. In the next step of using the curve-fitting models, one can either use calculus to derive the cost minimizing value for X (number of servers) or perform a deterministic simulation where values of X are substituted into the model to compute and predict the total cost (Yp). The calculus-based approach is presented in the “Addendum” section of this chapter. As a simpler solution method to finding the optimal number of servers, simulation can be used. Representing a deterministic simulation (see Appendix F, Section F.2.1), the resulting costs of servers can be computed

using the quadratic model, as presented in Figure 7.5. These values were computed by plugging the number of server values (1 to 20) into the Yp quadratic function one at a time to generate the predicted values for each of the server possibilities. Note that the lowest value in these predicted values occurs with the acquisition of 10 servers at $6357.952, and the next lowest is at 11 servers at $6415.865. In the actual data in Figure 7.2, the minimum total cost point occurs at 9 servers at $4533, whereas the next lowest total cost is $4678 occurring at 10 servers. The differences are due to the estimation process of curve fitting. Note in Figure 7.3 that the curve that is fitted does not touch the lowest 5 cost values. Like regression in general, it is an estimation process, and although the ANOVA statistics in the quadratic model demonstrate a strong relationship with the actual values, there is some error. This process provides a near-optimal solution but does not guarantee one.

Figure 7.5 Predicted total cost in server problem for each server alternative Like all regression models, curve fitting is an estimation process and has risks, but the supporting statistics, like ANOVA, provide some degree of confidence in the resulting solution.

Finally, it must be mentioned that many other nonlinear optimization methodologies exist. Some, like quadratic programming, are considered constrained optimization models (like LP). These topics are beyond the scope of this book. For additional information on nonlinear programming, see King and Wallace (2013), Betts (2009), and Williams (2013). Other methodologies, like the use of calculus in this chapter, are useful in solving for optimal solutions in unconstrained problem settings. For additional information on calculus methods, see Spillers and MacBain (2009), Luptacik (2010), and Kwak and Schniederjans (1987).

7.4. Continuation of Marketing/Planning Case Study Example: Prescriptive Step in the BA Analysis In Chapter 5, “What Are Descriptive Analytics?” and Chapter 6, “What Are Predictive Analytics?” an ongoing marketing/planning case study was presented to provide an illustration of some of the tools and strategies used in a BA problem analysis. This is the third and final installment of the case study dealing with the prescriptive analytics step in BA.

7.4.1. Case Background Review The predictive analytics analysis in Chapter 6 revealed a statistically strong relationship between radio and TV commercials that might be useful in predicting future product sales. The ramifications of these results suggest a better allocation of funds away from paper and POS ads to radio and TV commercials. Determining how much of the $350,000 budget should be allocated between the two types of commercials requires the application of an optimization decision-making methodology.

7.4.2. Prescriptive Analysis The allocation problem of the budget to purchase radio and TV commercials is a multivariable (there are two media to consider), constrained (there are some limitations on how one can allocate the budget funds), optimization problem (BA always seeks to optimize business performance). Many optimization methods could be employed to determine a solution to this problem. Considering the singular objective of maximizing estimated product sales, linear programming (LP) is an ideal methodology to apply in this situation. To employ LP to model this problem, use the six-step LP formulation procedure explained in Appendix B.

7.4.2.1. Formulation of LP Marketing/Planning Model In the process of exploring the allocation options, a number of limitations or constraints on placing radio and TV commercials were observed. The total budget for all the commercials was set at a maximum of $350,000 for the next monthly campaign. To receive the price discount on radio commercials, a minimum budget investment in radio of $15,000 is required, and to receive the price discount on TV commercials, a minimum of $75,000 is necessary. Because the radio and TV stations are owned by the same corporation, there is an agreement that for every dollar of radio commercials required, the client firm must purchase $2 in TV commercials. Given these limitations and the modeled relationship found in the previous predictive analysis, one can formulate the budget allocation decision as an LP model using a five-step LP formulation procedure (see Appendix B, Section B.4.1): 1. Determine the type of problem—This problem seeks to maximize dollar product sales by determining how to allocate budget dollars over radio and TV commercials. For each dollar of radio commercials estimated with the regression model, $275.691 will be received, and for each dollar of TV commercials, $48.341 will be received. Those two parameters are the product sales values to maximize. Therefore, it will be a maximization model. 2. Define the decision variables—The decision variables for the LP model are derived from the multiple regression model’s independent variables. The only adjustment is the monthly timeliness of the allocation of the budget: X1= the number of dollars to invest in radio commercials for the next monthly campaign X2=the number of dollars to invest in TV commercials for the next monthly campaign 3. Formulate the objective function—Because the multiple regression model defines the dollar sales as a linear function with the two independent variables, the same dollar coefficients from the regression model can be used as the contribution coefficients in the objective function. This results in the following LP model objective function: Maximize: Z = 275.691 X1 + 48.341 X2 4. Formulate the constraints—Given the information on the limitations in this problem, there are four constraints:

Constraint 1—No more than $350,000 is allowed for the total budget to allocate to both radio (X1) and TV (X2) commercials. So add X1 + X2 and set it less than or equal to 350,000 to formulate the first constraint as follows: X1 + X2 ≤ 350000 Constraint 2—To get a discount on radio (X1) commercials, a minimum of $15,000 must be allocated to radio. The constraint for this limitation follows: X1 ≥ 15000 Constraint 3—Similar to Constraint 2, to get a discount on TV (X2) commercials, a minimum of $75,000 must be allocated to TV. The constraint for this limitation follows: X2 ≥ 75000 Constraint 4—This is a blending problem constraint (see Appendix B, Section B.6.3). What is needed is to express the relationship as follows: which is to say, for each one unit of X1, one must acquire two units of X2. Said differently, the ratio of one unit of X1 is equal to two units of X2. Given the expression, use algebra to cross-multiply such that: 2 X1 = X2 Convert it into an acceptable constraint with a constant on the right side and the variables on the left side as follows: 2 X1 − X2 = 0 5. State the Nonnegativity and Given Requirements—With only two variables, this formal requirement in the formulation of an LP model is expressed as follows: X1, X2 ≥ 0 Because these variables are in dollars, they do not have to be integer values. (They can be any real or cardinal number.) The complete LP model formulation is given here:

7.4.2.2. Solution for the LP Marketing/Planning Model From Appendix B, one knows that both Excel and LINGO software can be used to run the LP model and solve the budget allocation in this marketing/planning case study problem. For purposes of brevity, discussion will be limited to just LINGO. As presented in Appendix B, LINGO is a mathematical programming language and software system. It allows the fairly simple statement of the LP model to be entered into a single window and run to generate LP solutions. LINGO opens with a blank window for entering whatever type of model is desired. After entering the LP model formulation into the LINGO software, the resulting data entry information is presented in Figure 7.6.

Figure 7.6 LINGO LP model entry requirements: marketing/planning case study There are several minor differences in the model entry requirements over the usual LP model formulation. These differences are required to run a model in LINGO. These include (1) using the term “Max” instead of “Maximize,” (2) dropping off “Subject to” and “and” in the model formulation, (3) placing an asterisk and a space between unknowns and constant values in the objective and constraint functions where multiplication

is required, (4) ending each expression with a semicolon, and (5) omitting the nonnegativity requirements, which aren’t necessary. Having entered the model into LINGO, a single click on the SOLVE option in the bar at the top of the window generates a solution. The marketing budget allocation LP model solution is found in Figure 7.7.

Figure 7.7 LINGO LP model solution: marketing/planning case study As it turns out, the optimal distribution of the $350,000 promotion budget is to allocate $116,666.70 to radio commercials and $233,333.30 to TV commercials. The resulting Z value, which in this model is the total predicted product sales in dollars, is 0.4344352E+08, or $43,443,524. Comparing that future estimated month’s product sales with the average current monthly product sales of $16,717,200 presented in Figure 7.7, it does appear that the firm in this case study will optimally maximize future estimated monthly product sales if it allocates the budget accordingly (that is, if the multiple regression model estimates and the other parameters in the LP model hold accurate and true). In summary, the prescriptive analytics analysis step brings the prior statistical analytic steps into an applied decision-making process where a potential business performance improvement is shown to better this organization’s ability to use its resources more effectively. The management job of monitoring performance and checking to see that business performance is in fact improved is a needed final step in the BA analysis. Without proof that business performance is improved, it’s unlikely that BA would continue to be used.

7.4.2.3. Final Comment on the Marketing/Planning Model Although the LP solution methodology used to generate an allocation solution guarantees an optimal LP solution, it does not guarantee that the firm using this model’s solution will achieve the results suggested in the analysis. Like any estimation process, the numbers are only predictions, not assurances of outcomes. The high levels of significance in the statistical analysis and the added use of other conformational statistics (R-Square, adjusted R-Square, ANOVA, and so on) in the model development provide some assurance of predictive validity. There are many other methods and approaches that could have been used in this case study. Learning how to use more statistical and decision science tools helps ensure a better solution in the final analysis.

Summary This chapter discussed the prescriptive analytics step in the BA process. Specifically, this chapter revisited and briefly discussed methodologies suggested in BA certification exams. An illustration of nonlinear optimization was presented to demonstrate how the combination of software and mathematics can generate useful decision-making information. Finally, this chapter presented the third installment of a marketing/planning case study illustrating how prescriptive analytics can benefit the BA process. We end this book with a final application of the BA process. Once again, several of the appendixes are designed to augment this chapter’s content by including technical, mathematical, and statistical tools. For both a greater understanding of the methodologies discussed in this chapter and a basic review of statistical and other quantitative methods, a review of the appendixes and chapters is recommended.

Addendum The differential calculus method for finding the minimum cost point on the quadratic function that follows involves a couple of steps. It finds the zero slope point on the cost function (the point at the bottom of the u-shaped curve where a line could be drawn that would have a zero slope). There are limitations to its use, and qualifying conditions are required to prove minimum or maximum positions on a curve. The quadratic model in the server problem follows: Yp = 35417.772 − 5589.432 X + 268.445 X2 [Quadratic model] Step 1. Given the quadratic function above, take its first derivative: d(Yp) = − 5589.432 + 536.89 X

Step 2. Set the derivative function equal to zero and solve for X. 0 = − 5589.432 + 536.89 X X = 10.410758 Slightly more than ten servers should be purchased at the resulting optimally minimized cost value. This approach provides a near-optimal solution but does not guarantee one. For additional information on the application of calculus, see Field, M.J. (2012) and Dobrushkin, V.A. (2014).

Discussion Questions 1. How are prescriptive and descriptive analytics related? 2. How can we use simulation in both predictive and prescriptive analytics? 3. Why in the server problem were there so few statistically significant models? 4. Does it make sense that the resulting quadratic model in Figure 7.4 did not touch the lowest cost data points in the data file? Explain. 5. What conditions allowed the application of LP?

Problems 1. A computer services company sells computer services to industrial users. The company’s analytics officer has predicted the need for growth to meet competitive pressures. To implement this strategy, upper management has determined that the company would tactically expand its sales and service organization. In this expansion, new districts would be defined and newly hired or appointed managers would be placed in charge to establish and run the new districts. The first job of the new district managers would be to select the sales people and staff support employees for their districts. To aid the new district managers in deciding on the number of sales people and staffers to hire, the company researched existing office operations and made a number of analytic-based observations, which they passed on to the new district managers. A new manager’s district should, at the very least, have 14 sales people and 4 staffers to achieve adequate customer service. Research has indicated that a district manager could adequately manage the equivalent of no more than 32 employees. Sales people are twice as time consuming to manage as staffers. The district manager was assigned part of the floor in an office building for operations. This space could house no more than 20 sales people and staffers. The district manager had some discretion regarding budgetary limitations. A total payroll budget for sales people and staffers was set at $600,000. The manufacturing company’s policy in developing a new territory would be to pay sales people a fixed salary instead of commissions and salary. The yearly salary of a beginning sales person would be $36,000, whereas a staffer would receive $18,000. All the sales people and staffers being hired for this district would be new with the company, and as such, would start with the basic salaries mentioned. Finally, the source of prospective sales people and staffers would be virtually unlimited in the district and pose no constraint on the problem situation. What is the LP formulation of this model? 2. (This problem requires computer support.) What is the optimal answer to the problem formulated in Problem 1? 3. A trucking firm must transport exactly 900, 800, 700, and 1,000 units of a product to four cities: A, B, C, and D. The product is manufactured and supplied in two other cities, X and Y, in the exact amounts to match the total demand. The production of units from the two cities is 1,900

and 1,500 units, respectively, to X and Y. The cost per unit to transport the product between the manufacturing plants in cities X and Y and the demand market cities A, B, C, and D are given here:

For example, in the table, $0.65 is the cost to ship one unit from Supply Plant X to Demand Market A. The trucking firm needs to know how many units should be shipped from each supply city to each demand city in such a way that it minimizes total costs. Hint: This is a multidimensional decision variable problem (see Section B.6.4 in Appendix B). What is the LP model formulation for this problem? 4. (This problem requires computer support.) What is the optimal answer to the problem formulated in Problem 3?

References Adkins, T.C. (2006). Case Studies in Performance Management: A Guide from the Experts. Wiley, New York, NY. Albright, S.C., Winston, W.L. (2014). Business Analytics: Data Analysis & Decision Making. Cengage Learning, Stamford, CT. Betts, J.T. (2009). Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, 2nd ed., Society for Industrial & Applied Mathematics. London. Cooper, W.W., Seiford, L.M., Zhu, J. (2013). Handbook on Data Envelopment Analysis. Springer, New York, NY. Dobrushkin, V.A. (2014). Applied Differential Equations: An Introduction. Chapman and Hall/CRC, New York, NY. Field, M.J. (2012). Differential Calculus and Its Applications. Dover Publishing, Mineola, NY. Hillier, F.S. (2014). Introduction to Operations Research, 10th ed., McGraw-Hill Higher Education, Boston, MA. King, A.J., Wallace, S.W. (2013). Modeling with Stochastic Programming, Springer, New York, NY.

Kwak, N.K, Schniederjans, M.J. (1987). Introduction to Mathematical Programming. Kreiger Publishing, Malabar, FL. Liebowitz, J. (2014). Business Analytics: An Introduction. Auerbach Publications, New York, NY. Luptacik, M. (2010). Mathematical Optimization and Economic Analysis. Springer, New York, NY. Rothlauf, F. (2013). Design and Modern Heuristics: Principles and Application. Springer, New York, NY. Sekaran, U., Bougie, R. (2013). Research Methods for Business: A SkillBuilding Approach. Wiley, New York, NY. Spillers, W.R., MacBain, K.M. (2009). Structural Optimization. Springer, New York, NY. Williams, H.P. (2013). Modeling Building in Mathematical Programming. Wiley, New York, NY.

8. A Final Business Analytics Case Problem Chapter objectives: • Provide a capstone business analytics (BA) overview within a case study problem. • Show the step-wise connections of the descriptive, predictive, and prescriptive steps in the BA process.

8.1. Introduction In Parts I, “What Are Business Analytics?” and II, “Why Are Business Analytics Important?” (Chapters 1 through 3), this book explained what BA is about and why it is important to business organization decision-making. In Part III, “How Can Business Analytics Be Applied?” (Chapters 4 through 7), we explained and illustrated how BA can be applied using a variety of different concepts and methodologies. Completing Part III, we seek in this chapter a closing illustration of how the BA process can be applied by presenting a final case study. This case study is meant as a capstone learning experience on the business analytics process discussed throughout the book. Several of the concepts and methodologies presented in prior chapters and the appendixes will once again be applied here. As will be seen in this case study, unique metrics and measures are sometimes needed in a BA setting to affect a solution to a problem or answer a question. Therefore, the methodologies and approach used in this chapter should be viewed as just one approach in obtaining the desired information. Undertaking the analytic steps in the BA process (see Chapter 1, “What Are Business Analytics?”) requires a beginning effort that preempts data collection efforts. This prerequisite to BA is to understand the business systems that are a part of the problem. When BA effort has been outsourced (see Chapter 4, “How Do We Align Resources to Support Business Analytics within an Organization?”) or when it is completely performed in-house by a BA team (Chapter 3, “What Resource Considerations Are Important to Support Business Analytics?”), experienced managers must be brought into the process to provide the necessary systems behavior and general knowledge of operations needed to eventually model and explain how the business operates. In this case study, it is assumed that the staff or information is available. Based on this information, a BA project can be undertaken.

8.2. Case Study: Problem Background and Data A Midwest US commercial manufacturing firm is facing a supply chain problem. The manufacturer produces and sells a single product, a generalpurpose small motor as a component part to different customers who incorporate the motor into their various finished products. The manufacturer has a supply chain network that connects production centers located in St. Louis, Missouri, and Dallas, Texas, with six warehouse facilities that serve commercial customers located in Kansas City, Missouri; Chicago, Illinois; Houston, Texas; Oklahoma City, Oklahoma; Omaha, Nebraska; and Little Rock, Arkansas. Part of the supply chain problem is the need to keep the cost of shipping motors to the customers as low as possible. The manufacturer adopted a lean management philosophy that seeks to match what it produces with what is demanded at each warehouse. The problem with implementing this philosophy is complicated by the inability to forecast the customer demand month to month. If the forecast of customer demand is too low and not enough inventory is available (an underage of inventory), the manufacturer has to rush order motors that end up being costly to the manufacturer. If the forecast is too high and the manufacturer produces and ships unwanted inventory (an overage of inventory), the warehouse incurs wasteful storage costs. The management of the manufacturing firm has decided that an analytics-based procedure needs to be developed to improve overall business performance. This would be a procedure that analysts could use each month to develop an optimal supply chain schedule of shipments from the two supply centers to the six warehouse demand destinations that would minimize costs. A key part of this procedure would be to include a means to accurately forecast customer demand and an optimization process for shipping products from the manufacturing centers to the warehouse demand destinations. The manufacturing firm created a small BA team to develop the procedure (see Chapter 4, Section 4.1.1). The BA team consists of a BA analyst (who would be responsible for using the procedure and heads the BA team), the supply chain general manager, the shipping manager (responsible for drafting the shipping schedule), and a warehouse manager (whose job it is to develop monthly forecasts).

8.3. Descriptive Analytics Analysis Determining a procedure by which analyst teams can determine optimal shipments between supply sources and demand destinations requires differing types of data. There is supply, demand, and cost data required to plan shipments. The total manufactured supply of motors produced at the St. Louis and Dallas plants is determined once the forecast demand is established. The BA team established that there is ample capacity between both plants to satisfy the forecasted customer demand at the six warehouse demand destinations. The BA team determined that the cost data for shipping a motor from the production centers to the customers depends largely on distance between the cities, where the items are trucked directly by the manufacturer to the warehouses. The cost data per motor shipped to a customer is given in Table 8.1. For example, it costs the manufacturer $4 per motor to ship from St. Louis to Kansas City. These cost values are routinely computed by the manufacturer’s cost accounting department and are assumed by the BA team to be accurate.

Table 8.1 Estimated Shipping Costs Per Motor The present system of forecasting customer demand usually results in costly overages and underages shipped to the warehouses. In the past, the manufacturer would take a three value smoothing average to estimate the monthly demand. (See Section E.6.1 in Appendix E, “Forecasting.”) This evolved by taking the last three months of actual customer motor demand and averaging them to produce a forecast for the next month. The process was repeated each month for each of the six warehouses. Not making products available when customers demanded them caused lost sales, so the manufacturer would rush and ship products to customers at a loss. On the other hand, producing too much inventory meant needless production, inventory, and shipping costs. To deal with the variability in customer demand forecasting, models for each warehouse’s customer demand would need to be developed. The

customer demand data on which to build the models was collected from prior monthly demand in motors. To determine which data to include in a final sample and which to exclude, a few simple rules were adopted to eliminate potentially useless and out-of-date data. Going back more than 27 months invited cyclical variations caused by changes in the economy that were no longer present, so that data was removed. Unfortunately, some of the data files were incomplete and required cleansing (see Chapter 4). The resulting time series data collected on warehouse customer monthly demand files is presented in Table 8.2. It was decided that the most recent three months (darkened months of 25, 26, and 27) would not be included in the model development, but instead would be used for validation purposes to confirm the forecasting accuracy of the resulting models. This is similar to what was referred to as a training data set and a validation data set (see Section 6.3.1 in Chapter 6, “What Are Predictive Analytics?”).

Table 8.2 Actual Monthly Customer Demand in Motors As a part of the descriptive analysis, summary statistics were generated from both Excel (Table 8.3) and SPSS (Table 8.4). The mean values provide some basis for a monthly demand rate, but at this point consideration of overall behavior within data distributions is required to more accurately capture relevant variation. To that end, other statistics can provide some picture of the distribution of the data. For example, the Kurtosis coefficient (see Chapter 5, “What Are Descriptive Analytics?”) for Omaha’s demand

suggests a peaked distribution. This indicates that the variance about the mean is closely grouped toward the mean, implying a lack of variability in forecast values (a good thing). Note that the Standard Error statistic (see Chapter 5, Section 5.3) for Omaha is the smallest. Other statistics such as the Skewedness Coefficients suggest most of the distributions are negatively skewed. The median value peaks at a larger value than the mean and implies that the mean and mean-related statistics might not be as accurate in measuring the entire distribution’s behavior as other measures (like the median).

Table 8.3 Excel Summary Statistics of Actual Monthly Customer Demand in Motors

Table 8.4 SPSS Summary Statistics of Actual Monthly Customer Demand in Motors To better depict the general shape of the data and to understand their behavior, line graphs (see Chapter 5, Section 5.2) of the six customer demand files are graphed using SPSS in Figures 8.1 to 8.6. (The Excel versions look the same and will not be displayed.) As expected based on the summary statistics and now visually from the graphs, some of the customer demand functions look fairly linear, others are clearly nonlinear, and some possess so much variation they are unrecognizable. Identifying the almost perfect linear customer demand behavior in the warehouses in Chicago (Figure 8.2) and Oklahoma City (Figure 8.4) suggests the use of a simple linear regression model for forecasting purposes. The very clear, bellshaped, nonlinear functions for Houston (Figure 8.3) and Little Rock (Figure 8.6) suggest that a nonlinear regression model should be determined by the BA team to find the best-fitting forecasting model. Finally, the excessively random customer demand behavior for Kansas City (Figure 8.1) and Omaha (Figure 8.5) suggests that considerable effort is needed to find a model that may or may not explain the variation in the data well enough for a reliable forecast. There appear to be many time series variations (see Appendix E, Section E.2) in customer demand for the warehouses in these two cities.

Figure 8.1 Graph of Kansas City customer demand

Figure 8.2 Graph of Chicago customer demand

Figure 8.3 Graph of Houston customer demand

Figure 8.4 Graph of Oklahoma City customer demand

Figure 8.5 Graph of Omaha customer demand

Figure 8.6 Graph of Little Rock customer demand The fact that two of the four warehouse time series data files have more time series variations than the other four warehouse files does not prevent in this case (and in most others) a fairly accurate forecast. Because four of the six customer demand warehouses appear to have a fairly observable pattern of behavior, they will help improve the overall accuracy even with the substantial variations of the other two warehouses adding in some forecast error.

8.4. Predictive Analytics Analysis In this section, we continue with our illustrative example. Here we use the predictive analytics analysis step that requires model development effort and then model validation for the example. To complete the predictive analytics analysis, forecasts of warehouse demand are determined.

8.4.1. Developing the Forecasting Models The descriptive analytics analysis has suggested a course of action in identifying appropriate forecasting models in this next step of the BA process. To ensure the best possible forecasting models and confirm the descriptive analytics analysis results, the curve-fitting feature (Curve Estimation function) of SPSS will be utilized. Each of the six customer demand data files is analyzed through the SPSS program to generate potential regression models, as presented in Tables 8.5 through 8.10.

Table 8.5 SPSS Curve-Fitting Analysis for Kansas City Motor Demand Forecasting Model: Model Summary and Parameter Estimates

Table 8.6 SPSS Curve-Fitting Analysis for Chicago Motor Demand Forecasting Model: Model Summary and Parameter Estimates

Table 8.7 SPSS Curve-Fitting Analysis for Houston Motor Demand Forecasting Model: Model Summary and Parameter Estimates

Table 8.8 SPSS Curve-Fitting Analysis for Oklahoma City Motor Demand Forecasting Model: Model Summary and Parameter Estimates

Table 8.9 SPSS Curve-Fitting Analysis for Omaha Motor Demand Forecasting Model: Model Summary and Parameter Estimates

Table 8.10 SPSS Curve-Fitting Analysis for Little Rock Motor Demand Forecasting Model: Model Summary and Parameter Estimates Reviewing the R-Square values for each of the potential curve-fitting models, it turns out that the cubic model is the best fitting for all six data files. It is not surprising that in the cases of Houston and Little Rock, where the descriptive analytics graphs clearly show typical cubic (or quadratic) function behavior, that the only significant (F-ratio, p Nonparametric Tests > Related Samples as a navigation path. This particular set of functions helps the BA analyst by allowing the SPSS software to determine the best test to select for the analysis. For illustration, a comparison of the Sales 3 and 4 distributions is presented in Figure A.13. Note that SPSS chooses the Wilcoxon Signed Rank Test from those in Table A.4 as the best choice for this analysis. Note also that the significance level of 0.05 is an automatic default (which can be changed), and that the computed significance level, p = 0.014, is less than 0.05. Thus, the decision is to reject the null hypothesis. There is a significant difference in the two distributions between Sales 3 and 4.

Figure A.13 Nonparametric test of Sales 3 and 4 example

B. Linear Programming B.1. Introduction Linear Programming (LP) is a deterministic, multivariable, constrained, single-objective, optimization methodology. It’s a model with known, deterministic, and constant parameters, and it has more than one unknown or decision variable. LP has mathematical expressions that constrain the values of the decision variables, and it seeks to solve for an optimal solution with a single objective. It is a general-purpose modeling methodology, permitting application to just about every possible problem situation that fits the assumptions the model requires. (We will discuss the assumptions of the LP model in a later section of this appendix.) Specifically, LP can be used to model problems in all the functional areas of business (accounting, economics, finance, management, and marketing) and in all types of operations (industry-wide, government, agriculture, health care, and so on). Modeling a problem using LP is called programming. As such, LP is considered one of several mathematical programming methodologies available for use in the prescriptive step of the business analytic process.

B.2. Types of Linear Programming Problems/Models There are basically two types of LP problems/models: a maximization model and a minimization model. Business seeks to maximize profit or sales. In such cases, the single objective is maximization. Other business situations seek to minimize costs or resource utilization. In those cases, the single objective is minimization. In addition to these two basic types of LP models, there is a group of special case models. These models are also maximization or minimization models, but they are applied to a limited set of problems. One example is integer programming (discussed in Appendix D, “Integer Programming”), whose model solution requires integer values rather than real number solutions.

B.3. Linear Programming Problem/Model Elements B.3.1. Introduction All LP problem/model formulations consist of three elements: an objective function, constraints, and nonnegativity or given requirements. The generalized model (a model without actual values, only symbols) requires the three components presented in Exhibit A. Note that the applied model is also presented in Exhibit B. Both models will be discussed in this section. The exhibit used here foreshadows the formulation of models discussed in this appendix. A. Generalized LP Model

B. An Applied LP Model (Ford Motor Company problem/To be explained in this Chapter)

B.3.2. The Objective Function The objective function is generally expressed as one of the following: Maximize: Z = c1 X1 + c2 X2 + . + cn Xn or Minimize: Z = c1 X1 + c2 X2 + . + cn Xn where: Z = an unknown that is not a variable but one that will be solved when the values of the decision variables are determined Xj = decision variables for j = 1, 2, . n; which are the unknowns to solve for an optimal value

cj = contribution coefficients for j = 1, 2, . n; which represent the per-unit contribution to Z for each unit of the decision variable to which they are related The objective function is always an equality expression with the same form and style as the preceding two. In this book, the coefficients are always positive (although in some real-world problems, they can be negative). If in a problem the single objective is maximizing profit, use the Maximize Z function. If the objective is to minimize costs, use the Minimize Z function. This objective function can be illustrated by a simple problem. Suppose one wants to decide how many automobiles a Ford Motor Company plant should produce in a week. The plant is capable of producing only two types of automobiles: Mustangs and Thunderbirds. So the decision variables in this LP model will be as follows: X1 = number of Mustangs to produce per week X2 = number of Thunderbirds to produce per week The plant would not produce automobiles unless it could make some profit from the endeavor. Suppose it could make $1,000 on each Mustang and $3,500 on each Thunderbird. These values (1,000 and 3,500) represent the per unit of automobile profit contribution to what will be the total profit (Z) and are the contributions coefficients c1 and c2 in the model. The resulting objective function for this problem would be this: Maximize: Z = c1 X1 + c2 X2 (generalized form) Maximize: Z = 1000 X1 + 3500 X2 (applied form) If this objective function had no constraints to limit the size of the decision variables, they could be set at positive infinity to make as much profit as possible. Unfortunately, in the real world, there are always constraints to limit the optimization effort.

B.3.3. Constraints The constraints in an LP model can generally be expressed as the following: subject to: a11 X1 + a12 X2 + . + a1n Xn ≤ b1 a21 X1 + a22 X2 + . + a2n Xn ≥ b2 . am1 X1 + am2 X2 + . + amn Xn = bm

where: bi = a right-hand-side value for i = 1, 2, . m; and “m” is the number of constraints in the model each having a right-hand-side value usually representing a total resource availability or requirement aij = technology coefficients for i = 1, 2, . m and j = 1, 2, . n; which represent the per-unit usage of the related ith right-hand-side value by the related jth decision variable In the constraints, the technology coefficients (aij) are located by row with the first subscript and by column with the second subscript. The term technology coefficient is used to describe this parameter because technology applications are the principle determiner to the size of this coefficient. LP constraints come in only three expressions: ≤, ≥, or =. Some models have only one type of expression for all their constraints; other models can use all three types. How does one know when a particular type should be used? It depends on the related right-hand-side b value. If b is a total maximum value (like total number of labor hours that at most can be used for production), then use a ≤ expression. If b is a total minimum value (like total minimum number of labor hours that are contracted for production), then use a ≥ expression. If b is an exact value (like a jeweler who has 20 diamonds and must use exactly 20 diamonds in 20 necklaces), use an = expression. The left-hand-side of the constraint represents resources that produce the decision variable values. The right-hand-side (RHS) represents the number of resources to be considered in the model. When the model solves for the optimal decision variable values, they have to conform to the limitations posed by these constraints. This is why the constraints begin with the two words “subject to.” The objective function is subject to (or limited by) the constraints. How many constraints are enough for modeling a problem? The answer depends on the problem. In this appendix, the formulation of constraints is based on available information from word problems. In an actual real-world problem, modelers are guided by available information or data. Like a painter placing paint on a canvas, a modeler adds as many constraints in the model as there is available data to formulate them. LP is a robust model that eliminates or makes redundant constraints that are not needed in most cases. But there is a balancing effort that must be considered. On the other hand, as can be seen in a later section on model formulation complications, too many constraints can spoil the formulation if they are formulated incorrectly. Too

few constraints also cause another complication preventing an accurate model from being formulated. Too many or too few are both examples of incorrect modeling, and it is only through experience that modelers can learn to successfully formulate LP models. The experience provided in this appendix to practice the formulation of these constraints will increase model formulation skills in determining how many or how few constraints should be included. Continuing with the Ford Motor Company problem, in an effort to maximize profit, there are some weekly resource limitations for production. First, there are only 10,000 hours of skilled labor at maximum use in the production of the Mustangs (X1) and the Thunderbirds (X2). Suppose that it takes 60 hours for each Mustang and 75 hours for each Thunderbird. To model this constraint, we again return to the generalized form: a11 X1 + a12 X2 ≤ b1 (generalized form) So, the resulting applied first constraint for the model, given the parameters above is: 60 X1 + 75 X2 ≤ 10000 (applied form) The value of 60 hours is the per-unit usage of the total 10,000 hours available for each unit of the related Mustang decision variable. When the optimal values for X1 and X2 are determined, the sum of the product with its respective technology coefficients must be less than or equal to the total maximum amount of skilled labor of 10,000 hours. Note in the constraint that there are no commas to denote the thousands. This is because model parameters will eventually be entered into software that does not accept the commas in the model formulation. Suppose this company also has a minimum usage requirement with the skilled labor such that they must use at least 3,000 hours each week due to labor contract requirements. The second constraint for this model would then be as follows: 60 X1 + 75 X2 ≥ 3000 Now suppose this company is under contract to produce exactly 140 automobiles per week to make its quota. This constraint would look like this: X1 + X2 = 140 Note in this constraint there is an implied technology coefficient of 1 in front of each decision variable.

B.3.4. The Nonnegativity and Given Requirements The decision variables in LP models are required to be zero or some positive value. As a formal part of the correct way of formulating an LP model (as is the case in most of the mathematical programming methods), one must add an additional statement in LP model formulations that looks like this: and X1, X2, . , Xn ≥ 0 These do not represent formal constraints on the model, but a limitation on the decision variables. As presented earlier, this tells users that this model requires its decision variables to be zero or any positive value, including real numbers and fractional values. What if one wants to produce whole units of the decision variable values (like whole units of Mustangs)? That requires the solution to generate only integer values. While the subject of “integer programming” will be presented in Appendix D of this book, this additional “given requirement” would have to be included in the model so users would know of its existence. This is done by revising the nonnegativity requirements to also include these given requirements: and X1, X2, . , Xn ≥ 0 and all integer This formal requirement is not necessary to run the model in a computer, but it is required in any formulation of a model. It is that portion of the formulation that tells users who look at the model they have to use the right kind of software (either LP or integer programming) to run and obtain a particular model. In the Ford Motor Company problem, the nonnegativity requirements that permit fractional values for the automobile production could be these: and X1, X2 ≥ 0 Now revisit Exhibit B.3.1.1 and see how the applied Ford Motor Company problem/model formulation complies with the generalized model formulation. The generalized model coefficients and terms will be used repeatedly.

B.4. Linear Programming Problem/Model Formulation Procedure Formulation of linear programming problems requires skill. In this section, we present a stepwise procedure useful in formulating LP problems. In addition, several practice problems and formulations are presented to help build formulation skills.

B.4.1. Stepwise Procedure The hardest part of figuring out any word problem or any real-world problem is always the first step. This stepwise procedure is a strategy for handling any kind of LP model. Big or small, it handles them all by breaking a complex process into small, achievable steps: 1. Determine the type of problem—A problem has to be either maximization or minimization. If the problem only mentions making profit or sales, it is most likely a maximization problem. If the problem only mentions cost, it most likely is a minimization problem. What if a problem includes sales and cost information? Then subtract the cost from the sales and derive profit. Maximizing profit both maximizes sales and minimizes cost. The values that can be used to determine the type of problem are called the contribution coefficients. 2. Define the decision variables—Step 1 determined the type of problem by finding profit or cost contribution coefficients. The number of profit or cost contribution coefficients determines the number of decision variables because these contribution coefficients are attached to the respective decision variables in the objective function. There are two things to remember in defining decision variables: (1) Make clear what the decision variables are determining; (2) State any “time horizon” the problem is requiring. In the Ford Motor Company example first mentioned in Section B.3.2, the definition of the first decision variable was as follows: X1 = number of Mustangs to produce per week This definition makes clear that the “number of Mustangs” will be produced. The definition also includes the time horizon of one week. An example of what is not acceptable in the definition of a decision variable is this: X1 = Mustangs

3. Formulate the objective function—Because the contribution coefficients, the type of problem in Step 1, and the decision variables in Step 2 have been identified, all that is left is to combine these into the form of an objective function, as presented in Section B.3.2. 4. Formulate the constraints—Introduced in Section B.3.3, this step is one of the hardest. Here are two strategies that can help: (1) Righthand-side strategy: Look at the problem for a sentence or a column in a table that lists the available resources that the model needs to achieve. These are the right-hand-side “b” parameters. Create a column vector (a column of numbers) that will represent the “b” values in the model. Then go back and read the problem again to find the technology coefficients to finish the left-hand-side of the constraint. (2) Left-handside strategy: In problems with tabled values, look to see if they are technology coefficients. Take the technology coefficients and align them by row or column to form the left-hand-side of the constraints. Then go back and read the problem again to find the right-hand-side values. 5. State the nonnegativity and given requirements—Simply use the statement of nonnegativity given in Section B.3.4. Now practice this formulation procedure on a series of problems. These problems range from very simple to more complex. They are designed for beginners but will prep anyone in developing LP models.

B.4.2. LP Problem/Model Formulation Practice: Butcher Problem Problem Statement: Consider the problem of a butcher mixing the day’s supply of meatloaf. The butcher has two grades of meatloaf: Grade 1 and Grade 2. The butcher needs to know how many trays of each kind of meatloaf should be made. The butcher may make whole trays or any fractional number of trays. The butcher’s profit is increased by $36 for each tray of Grade 1 that is mixed, and by $34 for each tray of Grade 2. If there were no constraints, the butcher would want to make both kinds of meatloaf to maximize profit. Unfortunately, the butcher has constraints that must be considered. • Constraint 1—The butcher cannot sell more than six trays of meatloaf per day. • Constraint 2—Only nine hours of mixing time are available for the butcher and staff. It takes two hours to mix a tray of Grade 1 and one hour to mix a tray of Grade 2.

• Constraint 3—The butcher has only 16 feet of shelf space for meatloaf. Each tray of Grade 1 requires 2 feet of shelf space. Each tray of Grade 2 requires 3 feet of shelf space. • Formulation—This problem clearly labels the constraints to make things easy. Remember to use the five-step formulation procedure to reduce a problem to easier and smaller steps to create the model. 1. Determine the type of problem—This problem only mentions profit, so it has to be a maximization problem. The two sales maximizing contribution coefficients ($36 and $34) in this model determine the type of problem. 2. Define the decision variables—The problem says, “The butcher needs to know how many trays of each kind of meatloaf should be made.” That is one hint. An easier one is in Step 1. The two contribution coefficients mean two decision variables. Because $36 is the amount of profit on a tray of Grade 1 meatloaf, the related decision variable has to be the “number of trays of Grade 1 meatloaf to make or mix per day.” Note the time horizon in the sentence, “Consider the problem of a butcher mixing the day’s supply of meatloaf.” The resulting two decision variables can be defined as follows: X1 = number of trays of Grade 1 meatloaf to make (or mix) per day X2 = number of trays of Grade 2 meatloaf to make (or mix) per day 3. Formulate the objective function—The formulation of the objective function follows easily from Steps 1 and 2. It is: Maximize: Z = 36 X1 + 34 X2 4. Formulate the constraints—Take one constraint at a time. In reading the sentence for Constraint 1 (“The butcher cannot sell more than six trays of meatloaf per day”), six is a parameter in the constraint. It has to be either a technology coefficient or a right-hand-side value. If it is a technology coefficient, it has to be directly related to an individual decision variable. If it is a right-hand-side value, it must be a total available resource. “Six” is a selling limitation on total trays, not individual trays. So it is a right-hand-side or b value. Because it is also a total maximum selling limitation, the direction of the inequality will be less than or equal to. What about the left-hand-side of this constraint? Well, what is the sum of all the trays of meatloaf? That can

be expressed as the sum of both decision variables, resulting in the first constraint of the model that follows: X1 + X2 ≤ 6 (selling) It is recommended that beginning LP modelers label their constraints with a word or two so that the modeler can remember that the particular limitation has been included as a constraint. It will also be helpful to others wanting to understand the model if the constraints are labeled with understandable terms. In Constraint 2, the sentences are, “Only nine hours of mixing time are available for the butcher and staff. It takes two hours to mix a tray of Grade 1 and one hour to mix a tray of Grade 2.” In the first sentence, “nine” is a total available mixing time limitation. So it is a right-handside value that represents a total maximum amount of this mixing resource resulting in a less than or equal to expression. In the second sentence “two” is attached to the Grade 1 decision variable, and “one” is attached to the Grade 2 decision variable. So these two parameters are technology coefficients. The resulting constraint follows: 2X1 + X2 ≥ 9 (mixing time) In Constraint 3, the sentences are, “The butcher has only 16 feet of shelf space for meatloaf. Each tray of Grade 1 requires 2 feet of shelf space, and each tray of Grade 2 requires 3 feet of shelf space.” In the first sentence, “16” is the total available shelf space limitation. So it is a right-hand-side value that represents a total maximum amount of this shelf space resource resulting in a less than or equal to expression. In the second sentence, the “2” is attached to the Grade 1 decision variable, and the “3” is attached to the Grade 2 decision variable. So these two parameters are technology coefficients. The resulting constraint is: 2X1 + 3X2 ≤16 (shelf space) 5. State the nonnegativity and given requirements—Because the model has only two decision variables and the problem specifically allows fractional values, all that is needed is to state the same nonnegative requirements as the basic generalized model presented in Section B.3.4 as here: and X1, X2 ≥ 0 The entire formulation of the butcher problem is again presented here:

B.4.3. LP Problem/Model Formulation Practice: Diet Problem Problem Statement: A diet is to contain at least 10 ounces of nutrient P, 12 ounces of nutrient R, and 20 ounces of nutrient S. These nutrients are acquired from foods A and B. Each pound of A costs four cents and has four ounces of P, three ounces of R, and no S. Each pound of B costs seven cents and has one ounce of P, two ounces of R, and four ounces of S. Desiring minimum cost, how many pounds of each food should be purchased if the stated dietary requirements are to be met? Formulation, by steps: 1. Determine the type of problem—This problem only mentions costs. Therefore, it must be a minimization problem. 2. Define the decision variables—How many cost values were used in Step 1 to determine the type of problem? Two (four cents and seven cents) are required. So how many decisions variables are needed? Two: If the four cents is the cost per pound of food A, the first decision variable follows: X1 = number of pounds of food A to purchase Note that there is no time horizon (day, week, and so on) in this problem. So do not put one in. The second decision variable is as follows: X2 = number of pounds of food B to purchase 3. Formulate the objective function—Note next that “cents” are being used. Some modelers might express the cents as 0.04 and 0.07, and others might express them as integer values. Note here that they are modeled as cents. Minimize: Z = 4X1 + 7X2 4. Formulate the constraints—This problem illustrates how the “righthand-side strategy” for formulating constraints might be helpful. Note

in the first sentence, “A diet is to contain at least 10 ounces of nutrient P, 12 ounces of nutrient R, and 20 ounces of nutrient S,” how the total minimum requirements are listed. These values create a column vector (10, 12, and 20), as presented in the right-hand-side values of the constraints that follow:

Note how the technology coefficients for food A (the X1 column) can be found in a single sentence, “Each pound of A costs four cents and has four ounces of P, three ounces of R, and no S,” and food B (the X2 column) can be found in a single sentence, “Each pound of B costs seven cents and has one ounce of P, two ounces of R, and four ounces of S.” Because all the constraints had total minimum amounts of nutrients, the resulting expressions are all greater than or equal to. 5. State the nonnegativity and given requirements—Because the model has only two decision variables, all that is needed is to state the same nonnegative requirements as the basic generalized model presented in Section B.3.4: and X1, X2 ≥ 0 The entire formulation of the diet problem is again presented here:

B.4.4. LP Problem/Model Formulation Practice: Farming Problem Problem Statement: The Smith family owns 175 acres of farmland for breeding pigs and sheep. On average, it takes 0.5 acres of land to support either a pig or a sheep. The family can produce up to a total of 7,000 hours of labor for breeding. It takes 15 hours of labor to breed a pig and 20 hours of labor to breed a sheep. Although the family is willing to breed sheep, they do not want to breed more than 200 sheep at a time. Also, pig breeding is limited to 250. It is expected that each pig will contribute $300 profit, whereas each sheep will contribute $350. Formulation, by steps: 1. Determine the type of problem—The problem only mentions profit, so it has to be a maximization problem. 2. Define the decision variables—The profit coefficients are attached to pigs and sheep, and there is no stated time horizon, so: X1 = number of pigs to breed X2 = number of sheep to breed 3. Formulate the objective function: Maximize: Z = 300X1 + 350X2 4. Formulate the constraints:

5. State the nonnegativity and given requirements: and X1, X2 ≥ 0

B.4.5. LP Problem/Model Formulation Practice: Customer Service Problem Problem Statement: The customer service department of a local department store provides repair services for merchandise sold. During one week, 5 television sets, 12 radios, and 18 electric percolators were returned for repair, representing overload work items. Two repair people are temporarily employed as part-time helpers to deal with the overload work. In a normal 8-hour workday, Person 1 can repair 1 television, 3 radios, and 3 electric percolators. In a normal 8-hour workday, Person 2 repairs 1 television, 2 radios, and 6 electric percolators. Person 1 makes $55 per day, and Person 2 makes $52 per day. The customer service department wants to minimize the total cost of operation, while maintaining good customer relationships. How many days should the two repair people be employed to handle the overload of work during this one week? Formulation, by steps: 1. Determine the type of problem—The problem only mentions cost, so it has to be a minimization problem. 2. Define the decision variables—The cost coefficients are attached to Person 1 and Person 2. Now this is a “fuzzy” time horizon problem. Are these people being hired for a week? No! They are hired for some unknown number of days to process a week’s overload. So the decision variables in this problem do not need a time horizon other than to say, specifically, they are handling the overload work. It can be written as follows: X1 = number of days Person 1 should be hired to handle the overload work X2 = number of days Person 2 should be hired to handle the overload work 3. Formulate the objective function: Minimize: Z = 55X1 + 52X2 4. Formulate the constraints:

5. State the nonnegativity and given requirements:

B.4.6. LP Problem/Model Formulation Practice: Clarke Special Parts Problem Problem Statement: The Clarke Special Parts Company manufactures three products: A, B, and C. Three manufacturing centers are necessary for the production process. Product A only passes through Centers 1 and 2; Products B and C must pass through all three manufacturing centers. The time required in each center to produce one unit of each of the three products is noted as follows:

So a unit of Product A takes three hours at Center 1, two hours at Center 2, and zero hours at Center 3. Each center is on a 40-hour week. The time available for production must be decreased by the necessary cleanup time. Center 1 requires four hours of cleanup, Center 2 requires seven hours, and Center 3 requires five hours. It is estimated that the profit contribution is $60 per unit of Product A, $40 per unit of Product B, and $30 per unit of Product C. How many units of each of these special parts should the company produce to obtain the maximum profit? Formulation, by steps: 1. Determine the type of problem—The problem only mentions profit, so it has to be a maximization problem. 2. Define the decision variables—The profit coefficients are attached to Products A, B, and C and have a weekly stated time horizon, so: X1 = number of units of Product A to produce per week X2 = number of units of Product B to produce per week X3 = number of units of Product C to produce per week 3. Formulate the objective function: Maximize: Z = 60X1 + 40X2 + 30X3 4. Formulate the constraints—This problem illustrates that some arithmetic may be needed to derive model parameters. In this case, the right-hand-side b values need to be adjusted for the cleanup time. In a

week, each department starts with 40 hours for production purposes. They then have to be decreased for the cleanup time, as stated in the problem sentences, “The time available for production must be decreased by the necessary cleanup time. Center 1 requires four hours of cleanup, Center 2 requires seven hours, and Center 3 requires five hours.” So, for Center 1 we have 36 hours (40 − 4), for Center 2 we have 33 hours (40 − 7), and so on to formulate the right-hand-side values in each constraint. This problem also illustrates the use of the left-hand-side strategy for formulating constraints. Note how the tabled values are the technology coefficients listed by columns in the constraints that follow:

5. State the nonnegativity and given requirements: and X1, X2, X3 ≥ 0

B.4.7. LP Problem/Model Formulation Practice: Federal Division Problem Problem Statement: The Federal Division has a contract to supply at least 72 engine parts. There are three different production processes for engine parts. The processes require different amounts of skilled labor, unskilled labor, and computer time for machine tools. Any one process is, by itself, capable of producing an engine part.

In the foreign country where they operate their plant, skilled labor costs $8.00/hour, and no more than 288 hours can be obtained. Unskilled labor costs $3.00/hour, and no more than 324 hours can be obtained. Computer time costs $10/minute, and no more than 196 minutes are available. Recommend a course of action. Formulation, by steps:

1. Determine the type of problem—The problem only mentions cost, so it has to be a minimization problem. 2. Define the decision variables—This problem is meant to challenge and build skills in identifying decision variables. What is the variable here? In this problem, only one product, an engine part, is produced. So what is the variable? The variable in this problem is the “process” by which engine parts are produced. So the decision variables become this: X1 = number of engine parts to produce by Process 1 X2 = number of engine parts to produce by Process 2 X3 = number of engine parts to produce by Process 3 Like many firms today that have older technology to produce current products, this problem seeks to make the best use of a combination of old and new process technologies to produce the single product called engine parts. 3. Formulate the objective function—Given the definition of the preceding decision variables as processes, identify the correct contribution coefficients. Because they are directly related to the decision variable definitions, they can be defined as follows: c1 = cost of producing one engine part by Process 1 c2 = cost of producing one engine part by Process 2 c3 = cost of producing one engine part by Process 3 How are these parameters found? Use a little arithmetic to compute them as follows:

The resulting three parameters can then be put in an objective function as follows: Minimize: Z = 46X1 + 60X2 + 43X3 4. Formulate the constraints:

5. State the nonnegativity and given requirements: and X1, X2, X3 ≥ 0 There are additional practice problems in Section B.8.

B.5. Computer-Based Solutions for Linear Programming Using the Simplex Method In this section, we examine computer-based solutions methods. The most common method to obtain a solution for an LP model is through the use of the simplex algorithmic method. Although some elements of this methodology are useful in understanding LP solutions, our focus will be on utilizing computer software to generate answers.

B.5.1. Introduction The simplex method is an algebraic methodology based on finite mathematics. Remember determinates or matrix algebra from high school? The simplex method is based on the same mathematical process. The computer will generate a solution using the simplex method, although it is not needed to know how the mathematical process works. What is important is to understand that the simplex method is an optimization process. So it not only gives an optimal solution, but it internally proves that the solution is optimal. This section seeks to provide additional understanding of the by-products of information that the simplex method’s solution provides. These business analytics are often viewed as important as the solution that the LP model is designed to generate.

B.5.2. Simplex Variables The simplex method determines the optimal values for the x j decision variables and the value of Z. Using the simplex method requires employing three other variables as well: • Slack variable—A slack variable is used in a less than or equal to constraint to permit the left-hand-side of the constraint to equal the right-hand-side in the beginning of the solution process. It works like this. Given the following constraint: X1 + X2 ≤ 100

if one wants to express it as an equality, one would have to add an additional variable to take up the slack if the sum of the product is less than the 100 of the right-hand-side value. The slack variable is added to the left-hand-side of the constraint and rewritten as an equality expression: X1 + X2 + s1 = 100 For each constraint that is modeled, add a different slack variable. For example, in the farming problem from Section 4.4.4, the four constraints can be expressed as simplex equality constraints:

Why does the simplex method require the constraints to be expressed as equalities? In an optimal solution, one might not need to use all the maximum resources (acres of land, labor hours, and so on). If they’re not needed, they become slack resources. As it turns out, the slack variables become as important for managerial decision-making as the decision variables, because slack resources are idle resources that can be reallocated to more profitable production activities. • Surplus variable—A surplus variable is used in a greater than or equal to constraint to permit the left-hand-side of the constraint to equal the right-hand-side in the beginning of the solution process. Given the following constraint: X1 + X2 ≥ 100 if one wants to express it as an equality, an additional variable would have to be added to take up the additional or surplus if the sum of the product is greater than the 100 of the right-hand-side value. The surplus variable is added to the right-hand-side of the constraint and rewritten like this: X1 + X2 = 100 + s1 We then have to subtract the surplus variable from both sides to put it on the left-hand-side, where all variables belong (a constraint must have all the variables on the left-hand-side and a constant “b” value on

the right-hand-side to be a valid constraint). Note the following expression: X1 + X2 – s1 = 100 Unfortunately, the negative sign in front of a variable (even a surplus variable) is not acceptable in the mathematical process of the simplex method. So still another variable will have to be created to temporarily cancel out the negativity of the surplus variable in the simplex process. This third new variable is called an artificial variable and is represented by a capital “A” in the expression that follows: X1 + X2 – s1 + A1 = 100 • Artificial variable—The sole purpose of the artificial variable is just to perform a temporary mathematical adjustment to permit the simplex process to handle the negativity of the surplus variable. Ideally, the artificial variable will never pop up in a model solution. (There will be an LP complication that we will discuss later, where the artificial variable can pop up and prevent an optimal solution from happening.) In summary, the slack and surplus variables are not only necessary for the simplex method to work, but they provide useful information on a resulting solution by explaining deviation from right-hand-side values and revealing excess resources for reallocation.

B.5.3. Using the LINGO and Excel Software for Linear Programming Analysis There are many software applications that can solve LP problems. In this section, we will examine two such software apps: LINGO and Excel Add-In Solver. B.5.3.1. Trial Versions of LINGO Software (as of January 2014) LINGO software is a product of Lindo Systems (www.lindo.com). The use of this software can be made available for a limited time as a demo for free. For purposes of this book, the limited time will be sufficient. For those interested in owning a copy, there is an inexpensive version available through the Lindo website. Microsoft® Windows® versions of LINGO are compatible with Windows 2000, Windows XP, Windows Vista, Windows 7, and Windows 8. To obtain the trial version (useful for this book), complete the following steps: 1. Go to www.lindo.com. 2. Click on the LINGO icon.

3. Click on Download a Trial Version. 4. Click on Download LINGO. 5. Click on the appropriate download for the LINGO version that best works on your computer system. 6. The system may request that you register your copy. Feel free to do it now or later. To confirm that you downloaded it correctly, enter the LP model using the explanation in Section B.5.3.2. B.5.3.2. How to Use LINGO to Generate a Solution To use the LINGO software, which incorporates the simplex method, it is necessary to enter the data from the LP problem/model formulation into the computer. The input process and solution interpretation are illustrated using the farming problem from Section B.4.4, as stated again here: Maximize: Z = 300X1 + 350X2

where: X1 = number of pigs to breed X2 = number of sheep to breed Use LINGO by simply entering a modified version of the LP model formulation employing the following steps: 1. Double-click on the LINGO icon on the desktop (or wherever it’s located on the computer). A blank window opens. This is where to enter the LP model formulation. 2. The farming problem/model formulation should be entered, as stated in Figure B.1.

Figure B.1 Farming problem input into LINGO Note: (1) Use the terms Max or Min. (2) Do not use the Z parameter in the model formulation. (3) Use an asterisk between parameters and variables. (4) Variables may be whole words or a combination of letters or numbers, but they must not contain spaces or special characters. (5) Do not state “subject to.” (6) Each expression must end in a semicolon (;). (7) Do not state given requirements. (8) The less than or equal to symbol is expressed as ≤. 3. Click on the LINGO menu option at the top of the window to reveal the SOLVE option. Click it. 4. If anything is incorrectly input, an error statement in the form of a LINGO ERROR MESSAGE window pops up and shows you where the first mistake was made. Recheck the input data, just as it is presented in Figure B.1. 5. Assuming everything is correctly entered and the SOLVE option has been clicked, two windows will pop up. The first window, LINGO SOLVER STATUS, is a summary window to provide details on the interactive process of the simplex method. This is not essential information. Simply click CLOSE and exit the window. The second window presents the solution to the problem. The solution for the farming problem is presented in Figure B.2.

Figure B.2 Farming problem solution by LINGO The SOLUTION REPORT provides the solution for the LP problem. This report has several parts, some of which are explained in Appendix C, “Duality and Sensitivity Analysis in Linear Programming.” For now, focus on just reading the solution to the farming problem. In the OBJECTIVE FUNCTION row, the value of 115000.0 is the optimal value for Z, which in this problem is $115,000. The TOTAL SOLVER ITERATIONS row is just a notification that it took one iteration of the simplex method to generate the solution (not important information at the present). The optimal decision variable values are provided in the column headings listed as VARIABLE and VALUE. The REDUCED COST and DUAL PRICE columns can be ignored for now and will be discussed in Appendix C. Each xj decision variable is listed by row in these columns, and each has its optimal value given in the VALUE column. So, for this farming problem, X1 =150 and X2 =200, which means one should breed 150 pigs and 200 sheep to achieve the maximize profit of $115,000. In addition to the optimal Z and decision variable values, the simplex method gives the optimal slack and surplus variable values. This problem only had less than or equal to constraints, so there are only slack variables in each constraint. The optimal slack values are given by row. The rows (1 to 5) are a listing of the five expressions, including the objective function. Ignoring Row 1 for now, the four constraints are identified by Rows 2, 3, 4,

and 5. The optimal slack values for each constraint are found in the SLACK AND SURPLUS column. So, in this problem, S1 = 0, S2 = 750, S3 = 100, and S4 = 0. These numbers can be checked by substituting the optimal decision variable values into each constraint as stated here:

It will always be true that the slack variables will equal the values given in the SLACK AND SURPLUS column to permit the equalities to hold true:

The fact that 750 hour of labor are not needed is valuable information. One can now reallocate those hours to some other farming activity not specified in the original problem statement. Also, in the first column of numbers, the resulting optimal usage of the right-hand-side values can be seen. These values are obtained by subtracting from the b parameters in the stated formulation of the LP model the resulting slack variable values. In this solution, the actual usage of the b parameters follows: b1 = 175 (i.e., 175 − 0) b2 = 6250 (i.e., 7000 − 750) b3 = 150 (i.e., 250 − 100) b4 = 200 (i.e., 200 − 0) Knowing how many units of resources will be used in an optimal production or farming problem is just as important for planning purposes as knowing how many units of product or animal one plans to produce. Here’s a question that comes up: How does one know that the values in the Slack/Surplus column are slack values and not surplus values? Check the direction of the inequalities in the input section of the printout to know for sure. If the direction of the inequalities is less than or equal to, the values in the Slack/Surplus column have to be slack values. If the inequalities are

greater than or equal to, the values have to be surplus. In some problems, these business analytics are what is most important in planning resources. The existence of slack or surplus also reveals which constraints are necessary for a given solution and which constraints are not. A constraint that has zero slack or surplus is called a binding constraint because the constraint is important in determining the solution. Specifically, this constraint’s resources directly constrained the decision variables values. A constraint that has a positive slack or surplus value is called a nonbinding constraint or redundant constraint because it does not impact the solution in any way. In fact, nonbinding constraints can be dropped from a model, and one will find the values of the decision variables will be the same. In the farming problem, the first and fourth constraints are binding, and the second and third are nonbinding. B.5.3.3. Illustrative Excel Solution Excel uses an add-in called SOLVER to generate linear programming solutions. The add-in is available through OPTIONS in all Excel software. Excel requires a somewhat similar input process, as shown in Figure B.3.

Figure B.3 Farming problem input into Excel Although the exact formulation of this input and the procedures to set it up are not explained here, the idea is to show the similarity of the input and the output as in Figure B.4. For a more detailed set of instructions on data input and SOLVER solution procedures, see Excel HELP using Excel’s SOLVER “Define and Solve a Problem by Using Solver” instructions.

Figure B.4 Farming problem solution by Excel In Figure B.3, the optimal solution values for X1 and X2 appear in Cells B2 and C2 once SOLVER has arrived at the optimal values. Also, the optimized value of Z is found in Column D, Cell D3. The resulting slack or surplus will be presented in Column D, cells D4 to D7. All these solution values can be found in Figure B.4. Excel also provides a more descriptive solution statement in tabular form, as shown in Figure B.5. Compare these values and information to those that LINGO provides in Figure B.2. There is obviously a great deal of similarity. With Excel, the actual slack or surplus is determined by subtracting the related amounts in Column D from the RHS values. The difference is the amount of slack in this problem. (Note that slack is only found in ≤ constraints.)

Figure B.5 Detailed farming problem solution by Excel

Now consider another computer-generated solution for a different problem, such as the minimization problem presented here:

In Figure B.6, the LINGO model input is presented. The LINGO solution is shown in Figure B.7, and the Excel solution is shown in Figure B.8.

Figure B.6 LINGO minimization model input

Figure B.7 LINGO minimization model solution

Figure B.8 Excel minimization model solution In Figure B.7 and Figure B.8, the resulting optimal values for Z and the decision variables are Z = 20, X1 = 0, and X2 = 10. The values of the surplus variables in this problem are S1 = 0, S2 = 5, and S3 = 50. Note that these are surplus variables, because the constraints are all greater than or equal to. The values of the actual b parameters used in the solution are obtained by adding the stated b values from the model formulation and the related surplus variable values. The resulting actual values of b that will be utilized by this solution are b1 = 10 (10 + 0), b2 = 10 (5 + 5), and b3 = 100 (50 + 50). There are additional computer-generated practice problems in Section 4.8. The simplex method is a powerful analytic methodology for obtaining solutions from LP problems. Unfortunately, complications can develop that prevent users from obtaining a solution.

B.6. Linear Programming Complications There are complications that prevent the simplex method from generating a desired optimal solution or even the ability to formulate a problem. Being aware of these complications and what causes them can help users overcome them. Some of these complications include unbounded solutions, infeasible solutions, blending formulations, and multidimensional variables.

B.6.1. Unbounded Solutions An unbounded solution is not, in fact, a solution. The formulation of the problem is incorrect such that one or more of the decision variable values in the model goes to positive infinity. The resolution of this complication is to reformulate the problem correctly. How does one know the problem is unbounded? Most software packages tell the user when they run the problem. An example of an unbounded problem expressed as LINGO input is presented in Figure B.9, and its nonsolution is presented in Figure B.10. Figure B.11 presents the Excel input and output information.

Figure B.9 Unbounded LINGO maximization model input

Figure B.10 Unbounded LINGO maximization model nonsolution notification

Figure B.11 Unbounded Excel maximization model and nonsolution notification

B.6.2. Infeasible Solutions An infeasible solution is not a solution. The model has been incorrectly formulated in such a way that no solution set could be found to satisfy all the constraints in the model. Unless a solution is at least feasible, it cannot possibly be optimal. The resolution of this complication is to reformulate the problem correctly. How does one know the problem is infeasible? Most software packages will tell the user when they run the problem. An example of an infeasible problem expressed as LINGO input is presented in Figure B.12, and its nonsolution is presented in Figures B.13 and B.14.

Figure B.12 Infeasible LINGO maximization model input

Figure B.13 Infeasible LINGO maximization model nonsolution notification

Figure B.14 Excel input and output information

B.6.3. Blending Formulations A blending formulation is a formulation complication. In some situations, there may be a need to achieve a blend of two or more items, like mixing ounces of cereal with ounces of fruit to make a new breakfast product. This is accomplished in LP models with constraints. A simple example is a one-to-one ratio between two decision variables. For example, suppose there are several decision variables in a model (X2 and X3), but it is desired to have them equal each other in the final optimal solution. How can this relationship be expressed as a constraint to achieve this ratio? Simply, one wants X2 = X3, which is not a constraint with a constant b right-hand-side value. So algebraically it is converted to X2 − X3 = 0 or − X2 + X3 = 0. Either of these two equalities achieves a one-to-one equality for the two decision variables in a solution for an LP model. Another more complex blending formulation might involve mixing unequal parts. For example, suppose one wants to mix two parts of X1 to every one part of X2. The means of formulation of this constraint is achieved by a simple step-wise ratio approach between the two parts: 1. Express the two mixture ratios (two parts of X1 to one part of X2) as equalities: 2 = X1 1 = X2 2. Then set them as ratios:

3. Then algebraically multiply each side to obtain the equation: 2X2 = 1X1 4. Finally, subtract 2x2 from both sides to obtain the desired constraint: 1X1 − 2X2 = 0 This approach can be used to develop as many blending constraints as needed to achieve a desired mixture. It is important to remember, though, that these ratio constraints must be on a one-variable to one-variable basis. One limitation of this process is that there cannot be more than two variables in a constraint at a time, but there can be a multiple blending of constraints in a single model.

B.6.4. Multidimensional Decision Variable Formulations A multidimensional decision variable is one that has more than one characteristic describing it in its definition. An example of how one might structure and use multidimensional decision variables can be seen in a typical human resource problem. Suppose there are two people who must determine the optimal number of hours they should be scheduled to work over the next seven days. The configuration of this problem can best be expressed in a two-dimensional layout below:

So these variables have two dimensions: a person dimension and a day dimension. The variables are always expressed with the first subscript being the row (the person) and the second subscript being the column (day of the week). These variables could be generalized as Xij = number of hours the ith person should work on the jth day. This type of decision variable permits one to structure constraints in two dimensions. For example, suppose one has to limit the number of hours Person 1 could work in the week to no more than 60. The resulting constraint would be used: X11 + X12 + X13 + X14 + X15 + X16 + X17 ≥ 60 Now suppose we also want to limit the number of hours for both employees on a Saturday to no more than 10. This day type of constraint would be as follows: X16 + X26 ≥ 10 Hence, the solution procedure would seek a value for these variables in both a “person” dimension and a “day” dimension. The number of dimensions used for decision variables is up to the modeler. Remember, each dimension brings with it an almost geometric increase in the number of decision variables in the model.

B.7. Necessary Assumptions for Linear Programming Models Five basic assumptions must be met for LP to be used in a modeling a situation. These assumptions are also useful in deciding whether LP should be used to model a problem. Here are the five assumptions: 1. Linearity—All constraints and the objective function must be linear. If one has a nonlinear profit or cost function or a nonlinear constraint, other nonlinear programming methodologies must be used. A number of these are available in the literature, including such techniques as quadratic programming, separable programming, and Kuhn-Tucker conditions. 2. Additivity—All the constraints and the objective function must for any value of the decision variables add up exactly as modeled. That is, one cannot have synergistic impact, where 2 + 2 = 5. Regardless of the size of the decision variable value, the added values of the coefficients must be the simple sum of the products. If they’re not, use another methodology. 3. Divisibility—In the LP models presented in this appendix, the nonnegativity and given requirements allow the decision variable values to be real numbers or any fractional value. This means that if a decision variable ended up being 0.5, one-half of the profit or cost of that decision variable is exactly what will be received. Also, if the labor hour usage of that decision variable is two, then 0.5 of two means that exactly one hour of labor will be used. Sometimes fractional answers are not realistic. In such cases, use something other than LP— perhaps integer programming (which will be discussed in Appendix D). 4. Finiteness—This requirement simply means that the values of the decision variables must be finite. If they are not finite, they are infinite and, therefore, unbounded. 5. Certainty and a static time period—All of the a, b, and c parameters of an LP model must be known with certainty. We can help ensure this certainty by stating a time horizon or a static time period when the decision variables are defined. The static time period specifies the period over which the answer and the parameters remain true.

B.8. Linear Programming Practice Problems Following are several practice LP problems, followed by their answers. Use these problems to practice the methodologies and concepts presented in this appendix. 1. A small furniture manufacturer produces three different kinds of furniture: desks, chairs, and bookcases. The wooden materials have to be cut properly by machines. In total, 100 machine hours are available for cutting. Each unit of desks, chairs, and bookcases requires 0.8 machine hours, 0.4 machine hours, and 0.5 machine hours, respectively. This manufacturer also has 650 labor hours available for painting and polishing. Each unit of desks, chairs, and bookcases requires five labor hours, three labor hours, and three labor hours for painting and polishing, respectively. These products are to be stored in a warehouse, which has a total capacity of 1,260 sq. ft. The floor space required by these three products is nine sq. ft., six sq. ft., and nine sq. ft., respectively, per unit of each product. In the market, each product is sold at a profit of $30, $16, and $25 per unit, respectively. What is the formulation of this problem to determine how many units of each product should be made to realize a maximum profit? Answer: Let X1, X2, and X3 be the number of units of desks, chairs, and bookcases to be produced, respectively. Because 100 total machine hours are available for cutting, the production of X1, X2, and X3 should utilize no more than the available machine hours. Therefore, the mathematical statement of the first constraint is in the form 0.8X1 + 0.4X2 + 0.5X3 ≤ 100. Also, no more than 650 labor hours and 1,260 sq. ft. are available for painting, polishing, and storing, respectively. Therefore, these two constraints are in the form 5X1 + 3X2 + 3X3 ≤ 650 and 9X1 + 6X2 + 9X3 ≤ 1,260. Finally, the decision variables must be nonnegative. The complete problem formulation follows:

The LINGO input data and solution are presented in Figures B.15 and B.16, and the Excel version is presented in Figures B.17 and B.18.

Figure B.15 LINGO practice problem 1 model input

Figure B.16 LINGO practice problem 1 solution

Figure B.17 Excel practice problem 1 model input

Figure B.18 Excel practice problem 1 solution 2. The Riverside Company wants to outsource production of three products: premium toys, deluxe toys, and regular toys. These three different toys can be produced at two different external plants with different production capacities. In a normal day, Outsource Plant A produces 20 premium toys, 30 deluxe toys, and 100 regular toys. Outsource Plant B produces 50 premium toys, 40 deluxe toys, and 60 regular toys. The monthly demand for each is known to be 4,000 units, 3,000 units, and 1,000 units, respectively. The company has to pay a daily cost of operation, which is $50,000 for Outsource Plant A and $40,000 for Outsource Plant B. What is the formulation of this problem to find the optimum number of days of operation per month at the two different plants to minimize the total cost while meeting the demand? Answer: Let the decision variables X1 and X2 represent the number of days of operation in each of the plants. The objective function of this problem is the sum of the daily operational costs in the two different plants expressed as Minimize: Z = 50,000X1 + 40,000X2. The objective is to determine the value of the decision variables, X1 and X2, which yields the minimum total cost subject to the constraints. The production of each of the three types of toys must be at least greater

than or equal to the specific quantity to meet demand requirements. In no event should the production be less than the quantities of each product demanded. Together with the constraints, the problem can be formulated as such:

The LINGO input data and solution are presented in Figures B.19 and B.20.

Figure B.19 LINGO practice problem 2 model input

Figure B.20 LINGO practice problem 2 solution The Excel input data and solution are presented in Figures B.21 and B.22.

Figure B.21 Excel practice problem 2 model input

Figure B.22 Excel practice problem 2 solution 3. The Turned-On Radio Company manufactures models A, B, and C, which have profit contributions of 8, 15, and 25, respectively, per unit. The weekly minimum production requirements are 100 for model A, 15 for model B, and 75 for model C. Each type of radio requires a certain amount of time for the manufacturing component parts, assembling, and packaging. Specifically, a dozen units of model A require three hours of manufacturing, four hours for assembling, and one hour for packaging. The corresponding figures for a dozen units of model B are

3.5, 5, and 1.5; for a dozen units of model C are 5, 8, and 3. During the forthcoming week, the company has available 150 hours of manufacturing, 200 hours of assembling, and 60 hours of packing time. What is the formulation of the production scheduling problem as a linear programming model? Answer: Let X1 = the number of units of model A to produce, X2 = the number of units of model B to produce, and X3 = the number of units of model C to produce. The formulation of this problem in an LP model is given here:

The LINGO and Excel input data and solution are presented in Figures B.23 through B.25. Note that the fractional values have been converted to decimal values because the LINGO software requires no special characters with a parameter, such as a backslash.

Figure B.23 LINGO practice problem 3 model input

Figure B.24 LINGO practice problem 3 solution

Figure B.25 Excel practice problem 3 model input and solution

C. Duality and Sensitivity Analysis in Linear Programming C.1. Introduction This appendix is a continuation of the subject of linear programming (LP). The topics covered in this appendix are duality and sensitivity analysis in linear programing models. These methods are of value in the third step of the BA process: prescriptive analytics.

C.2. What Is Duality? Solving an LP problem as in Appendix B, “Linear Programming,” involves solving for the optimal value for Z, Xj, and si. This is known as a primal problem, because an optimal solution is being sought that explains the primary or primal relationship between the Xj and si variables, values as they consume right side bi constant parameters. Embedded within this primal solution is a dual solution. Every primal problem has a dual problem and a dual solution. The dual solution provides economic trade-off information (usually expressed in dollars) on the value of the right side bi unit usages of resources. For example, suppose one solves an LP model with a Z equal to $10,000 and uses 1,000 hours of skilled labor. Among other things, the dual solution would indicate what the dollar value of each hour of skilled labor is contributing to Z, while considering the other used resources. Duality in LP is a means of determining the economic value of right side bi values. It is a by-product of the simplex method and is usually given in the printout of any LP simplex model solution. The focus in this appendix is to know where the dual solution is on the computer printout and what it means for business analytic analysis and decision-making purposes.

C.2.1. The Informational Value of Duality What is the value of the information that duality provides? Duality has been used for many purposes. One important application is using duality to determine the economic contribution of resources. That is, determine the dollar contribution each resource is responsible for in generating the total optimized Z value. Another application that has appeared in the literature is its use in cost accounting, where dollar costs can be attributed to the specific resources that are used in the LP model solution. The dual solution is an economic valuation methodology that provides valuations based on the scarcity of resources used in constraining an existing

optimal solution. So, when resources are abundant (a slack resource or a surplus resource), the dual solution assigns a $0 contribution to that resource, even though some of it may be used in an optimal solution. In summary, there are two types of information defined in the dual solution: (1) marginal contribution to Z for each unit of the related right side parameters, bi and (2) marginal cost or loss to Z for each unit of the related decision variables, Xj. In this appendix, the focus is on using the information from the dual solution, rather than formulation or computational effort.

C.2.2. Sensitivity Analysis Dual solutions have limitations. Specifically, the value of a dual variable only remains true for a certain number of units of the related right side value. For example, if in the farming problem (see Appendix B, Section B.4.4) one wanted to increase acres of land from the current level of 175 to 500, one cannot expect to receive a dual decision variable value of $600 for all of them. The question then becomes how many units can be increased or decreased from a given right side value and still be sure of receiving the dual decision variable value exactly. The answer to this range of numbers is found in a subsequent procedure called sensitivity analysis. Sensitivity analysis is a procedure to determine the limitations on parameter changes in a model and their impact on an existing solution. Although there are many different types of sensitivity analyses, the discussion will be limited to two of the most important that involve changes in the parameters cj and bi. In general, sensitivity analysis can answer questions related to making changes in the parameters (cj and bi), inform what impact those changes will have on the value of Z, and determine whether the variables in the primal solution will change. Both cj and bi sensitivity analyses result in generating a relevant range (computed values that define the interval of allowable change in a parameter) over which changes in the two cj and bi parameters can take place and predict changes in the existing solution without rerunning the model. Why not just make a change in a parameter and simulate the outcomes in the model to find out the changes? That deterministic simulation approach can work in some situations, but finding specific values in which a dramatic change in an existing solution will take place (a particular threshold) could take an infinite number of experiments to find the exact threshold value. Sensitivity analysis gives the exact value and does so as a by-product of the original solution

effort in solving the primal LP problem. Fortunately, Excel’s add-in program, Solver, provides a separate Sensitivity Report that contains the computed relevant range values.

C.3. Duality and Sensitivity Analysis Problems Consider four problems: two primal maximization problems and two primal minimization problems.

C.3.1. A Primal Maximization Problem Take the farming problem formulated in Appendix B, Section B.4.4, which was solved using Excel in Section B.5.3 (restated here): Problem Statement: The Smith family owns 175 acres of farmland for breeding pigs and sheep. On average, it takes 0.5 acres of land to support either a pig or a sheep. The family can produce up to 7,000 hours of labor for breeding. It takes 15 hours of labor to breed a pig and 20 hours of labor to breed a sheep. Although the family is willing to breed sheep, they do not wish to breed more than 200 sheep at one time. Also, breeding pigs is limited to 250. It is expected that each pig will contribute $300 to profit, whereas each sheep will contribute $350.

where: X1 = number of pigs to breed X2 = number of sheep to breed The LINGO formulation and data entry are presented in Figure C.1, and the Excel formulation and solution are in Figure C.2.

Figure C.1 Formulation of farming problem in LINGO

Figure C.2 Formulation of farming problem and solution in Excel The dual values related to each of the four right side values bi that one seeks to determine follow: • Marginal contribution of one acre of land • Marginal contribution of one hour of labor • Marginal contribution of one pig • Marginal contribution of one sheep The dual values are related to each of the primal problem’s Xj decision variables: • Marginal cost or loss of one pig • Marginal cost or loss of one sheep It is absolutely necessary to fully understand the components of the primal problem to define the dual variables. Use either the LINGO printout of this solution presented in Figure C.3 or the detailed Excel solution with sensitivity analysis range values in Figure C.4.

Figure C.3 LINGO LP solution to the farming problem with dual solution

Figure C.4 Excel LP solution to the farming problem with dual and sensitivity analysis solution

The Reduced Cost column in Figures C.3 and C.4 lists the dual values, and the Dual Price column lists the dual right side values. The constraints are listed by number for reference. To make the comparison of the primal and related dual solution values, both solutions are detailed here:

Note that the slack variables for the primal solution are included. These are important in understanding the resulting values of the dual problem. A few points can be made with this first solution: • Whenever the primal solution has a positive slack value (s2 = 750 and s3 = 100) (or surplus), the related dual price decision variable (0 and 0) is expected to be zero. This is because in economics, scarcity of resources means that one has more than needed (slack resources). So there is no economic value in obtaining more of a resource that will just be relegated to slack resources. • In most cases, whenever the primal solution has a zero slack value (s1 = 0 and s4 = 0) (or surplus in the case of minimization problem), the related dual price decision variable (600 and 50) will be positive. This is so because the zero slack (or surplus) means that the constraint is binding and has a direct impact on the existing solution. The amount of the impact per unit of decision variable value is given by the dual decision variable values. • When the primal solution decision variables are positive (x1 = 150 and x2 = 200), the dual reduced cost value will be zero (0 and 0), indicating there is no cost or loss in producing the amount of the decision variable values suggested in the primal problem solution.

• When the primal solution decision variables are zero (which did not happen in the problem), the dual reduced cost values will always be some positive value, indicating that there is a cost or loss in producing each unit of the related decision variable in the primal problem. In the farming problem, the dual price value of 600 means that the marginal contribution of one acre of land is $600. So, what is a good dual price to pay to buy an additional acre of land? The answer is $600, because that is all it will add to profit. In this problem, each of the 175 acres of land added $600 to total maximized profit of Z = $115,000. The dual solution also works in reverse. If one had to cut back on acres of land by one acre, how would it impact Z? Simply, one would lose $600 for that acre or for as many acres as would be cut back. So, to cut back 10 acres (reducing b1 from 175 to 165), what would the impact on Z be? It would result in a $6,000 reduction ($600 × 10 acres). There are limitations to the number of units that can increase or decrease a right side value such that the dual price remains true. Those will be addressed by sensitivity analysis later. Continuing the farming problem, the dual price values of 0 and 0 mean that the marginal contribution of an hour of labor or an extra pig are both $0. So, what is a good dual price if one was to buy an additional hour of labor or breed an extra pig beyond the 250 unit maximum limit? The answer is $0, because neither will add to Z. In the farming problem, the dual price value of 50 means that the marginal contribution of breeding an extra sheep beyond the maximum limit of 200 is $50. How can one breed an extra sheep when all 175 acres of land have been used? Well, there’s an economic trade-off. To raise an extra sheep, one would have to raise one fewer pig, because there are only 175 acres for breeding. It is known that it takes 0.5 acres of land to breed a pig or a sheep. So, shifting 0.5 acres to breed an extra sheep results in losing the 0.5 acres from a pig. The result is gaining $350 for a new sheep but losing $300 for the pig, or a net economic trade-off of $50 ($350 − $300). Note that it is only in this simple problem that one can easily illustrate the trade-off. The dual solution also works in reverse. Cutting back on sheep breeding by one, how would this impact Z? Simply, one would lose $50 for that sheep and for as many additional sheep as would be cut back. Finally, in the farming problem, the dual Reduced Cost values of 0 and 0 mean that the farmer will incur $0 marginal cost or loss of breeding of the 150 pigs and 200 sheep.

The sensitivity analysis information is available in both LINGO and Excel. The focus here is on Excel’s printout in Figure C.4. The ending columns (Allowable Increase and Allowable Decrease) in the Variables Cells table define the sensitivity analysis boundaries for the cj parameters, and the Constraints table defines the boundaries for the bi parameters. From the farming problem printout, one can determine the cj sensitivity analysis relevant ranges as follows:

The Excel value of 1E + 30 is to be viewed as a very large or no-limit number. So, in this problem, one can answer the following types of decisionmaking questions: 1. What is the lowest profit level on a pig, and will they still be bred? (Answer: $0) 2. What is the lowest profit level on a sheep, and will they still be bred? (Answer: $300) 3. If the profit on pigs increases to $400, will sheep still be bred? (Answer: No, the sheep variable will drop out of the solution.) 4. If the profit on each sheep drops from $350 to $320, will one still breed sheep? Can the new Z value be computed without rerunning the model? (Answer: Yes to both questions, since it is within the relevant range. The old Z of $115,000 will incur a decline to $109,000 [a reduction of $30 × 200 sheep]). 5. What if pig profit goes from $300 to $310? Should one breed more pigs? Will more profit be made? (Answer: No, the solution remains the same, so by not breeding more pigs, more profit will be made. The increase in profit will be $1,500 [$10 × 150 pigs]). In the farming problem’s Excel printout in Figure C.4, the bi sensitivity analysis relevant ranges can be determined as follows:

So, in this problem, the following types of decision-making questions can be answered: 1. If one must decrease the number of acres of land, how many can be decreased and still be assured to lose only $600 per acre? (Answer: 75) 2. If one has to increase the number of acres of land, how many can be increased and still have assurance of a gain of $600 per acre? (Answer: 25) 3. If one has to decrease the number of sheep bred at maximum, how many can be decreased and still be assured to lose only $50 per sheep? (Answer: 100) 4. How many hours of labor can be decreased and not change the existing solution of 150 pigs and 200 sheep? (Answer: 750, which are the slack hours). Finally, it should be noted that if the cj and bi parameter boundaries as defined by the relevant range are exceeded, the solution set variables and the solution as a whole breaks down. In such situations, it is advisable to simply change the parameters and simulate the impact on the solution by rerunning the model in the software.

C.3.2. A Second Primal Maximization Problem Consider the printout of a second sample LP maximization problem presented in Figures C.5 and C.6.

Figure C.5 Formulation of second max problem in LINGO

Figure C.6 Formulation and solution of second max problem in Excel Define the dual price values for this problem generally as follows: • Marginal contribution of one unit of b1 • Marginal contribution of one unit of b2 • Marginal contribution of one unit of b3 Define the dual reduced cost values related to each of the primal problem’s Xj decision variables as follows: • Marginal cost or loss of one unit of X1 • Marginal cost or loss of one unit of X2 The LINGO printout solution for this problem is presented in Figure C.7, and the Excel detailed solution is presented in Figure C.8.

Figure C.7 LINGO LP solution to the second max problem with dual solution

Figure C.8 Excel detailed LP solution to the second max problem with dual solution The printouts can be restated as follows:

The dual price value of 40 means the marginal contribution of one unit of b1 is $40. So, if b1 is increased to 21, one will get an additional $40 added to Z, and if b1 is decreased to 19, one will lose $40 from Z. The other dual price values of 0 and 0 mean that the marginal contribution of increasing either right side values (b1 or b2) will not add anything to Z. This is expected, because there are slack resources in both of these constraints. In this problem, the reduced value of 0 is expected because its related X1 is equal to a positive value (20). The reduced cost value of 10 means that each unit of X2 that one decides to produce will cost $10 in maximized Z. (One will lose it.) Because the model’s current solution suggests not to produce X2, one would not, but if a constraint is added to this model such that X2 = 1, and then it is rerun, this model would force the solution to produce one unit of X2. In such a case, one would also lose $10 from the maximized $800.

C.3.3. A Primal Minimization Problem For this first minimization problem, consider a new word problem, which will be called the manufacturing company problem. Problem Statement: A manufacturing company wants to determine how many days per month each of two plants (Plant A and Plant B) should be operated to satisfy at least the minimum market demand on three tires produced in each plant. The number of each tire produced per day and the total minimum demand is given in the following table:

If it costs the manufacturing company $2,500 per day to operate Plant A and $3,500 per day to operate Plant B, what are the optimal number of days each plant should be operated to satisfy the total minimum monthly demand on tires? The LP model formulation of this problem follows:

where: X1 = number of days to operate Plant A per month X2 = number of days to operate Plant B per month The LINGO data entry of this model is presented in Figure C.9, and the Excel data entry is presented in Figure C.10.

Figure C.9 Formulation of primal manufacturing company problem in LINGO

Figure C.10 Formulation and solution of primal manufacturing company problem in Excel The dual problem solution values in this problem follow: • Marginal contribution of one premium tire • Marginal contribution of one deluxe tire • Marginal contribution of one regular tire These dual price values are simply the “marginal contribution of one. ” unit of a right side value of a resource. The dual reduced cost values are related to each of the primal problem’s Xj decision variables: • Marginal cost or loss of day of operation of Plant A • Marginal cost or loss of day of operation of Plant B As stated before, it is necessary to fully understand the components of the primal problem to define the dual solution values. The LINGO printout of this solution is presented in Figure C.11, and the Excel detailed solution is given in Figure C.12.

Figure C.11 LINGO LP solution to the primal manufacturing company problem with dual solution

Figure C.12 Excel LP solution to the primal manufacturing company problem with dual solution From the printouts, the primal and dual solutions follow:

Note on the printout that the dual price values have negative signs in front of them. Ignore these signs. These denote that the value originates from greater-than or equal constraints. The interpretation of these values is quite similar to a maximization problem. Interpreting this solution for the manufacturing company problem, the dual price value of 37.5 means that the marginal contribution of one premium tire is $37.50. So, what is a good dual price to pay for this tire? The answer is at least $37.50, because that is what it is costing in total minimized Z. The dual

solution also works in reverse. If it is necessary to cut back on production by one premium tire, how would it impact Z? Simply reduce total cost by $37.50 for that one tire or on as many tires as needed to cut back. So, to cut back ten premium tires (reducing b1 from 2,500 to 2,490), what would the impact on Z be? It would be a $375 reduction ($37.50 × 10 tires). In the manufacturing company problem, the dual price value of 0 means that the marginal contribution of adding a deluxe tire is $0. This is expected because there is a surplus in this constraint, making it nonbinding or redundant. Adding or subtracting a deluxe tire from the 3,000 minimum value will have no impact on the existing solution. In the manufacturing company problem, the dual price value of 6.25 means that the marginal contribution to Z of producing an extra regular tire (beyond the 7,000) is $6.25. The dual solution also works in reverse. If it is necessary to cut back on producing regular tires by one, it would reduce Z by $6.25 for that tire or on as many tires as needed to be cut back. Finally, in the manufacturing company problem, the reduced cost values of 0 and 0 mean that the manufacturing company will incur $0 marginal cost or loss for operating Plant A 20 days per month and Plant B 25 days per month. From the manufacturing company problem printout in Figure C.12, it can be determined that the cj and bi sensitivity analysis relevant ranges are as follows:

The interpretation of the relevant ranges is similar to the maximization problem, except Z is a cost function. Increases up to the boundaries and the dual solution trade-offs remain true. Beyond the boundaries invalidates the dual solution values. One interesting bi relevant range value for deluxe tires indicates the lower boundary is No Limit when in fact zero has to be the

lower boundary. The use by Excel here of No Limit or 1E + 30 is simply a default. Users should be aware that zero has to be the boundary to ensure that the nonnegativity requirements of the LP model are valid.

C.3.4. A Second Primal Minimization Problem Consider the printout of the LP maximization problem in Figure C.13 and the Excel problem presented in Figure C.14.

Figure C.13 Formulation of second min problem in LINGO

Figure C.14 Formulation and solution of second min problem in Excel The dual price values for this problem can generally be defined as such: • Marginal contribution of one unit of b1 • Marginal contribution of one unit of b2 • Marginal contribution of one unit of b3 The dual reduced cost values related to each of the primal problem’s Xj decision variables follow: • Marginal cost or loss of one unit of X1 • Marginal cost or loss of one unit of X2 The LINGO printout of this solution is presented in Figure C.15, and the Excel solution in Figure C.16.

Figure C.15 LINGO LP solution to the second min problem with dual solution

Figure C.16 Excel LP solution to the second min problem with dual solution The LP solutions can be restated as follows:

Interpreting this solution, the dual price value of 2 (the negative sign is ignored) means that the marginal contribution of one unit of b1 is $2. So, if b1 is increased from 10 to 11, one will have to add an additional $2 to Z, and if b1 is decreased to 9, Z will be decreased by $2. The other dual price values of 0 and 0 mean the marginal contribution of increasing either right side value will not add anything to Z. This is expected, because there are surplus resources in both of these constraints. In this problem, the reduced cost value of 16 means that for each unit of X1, it will cost $16 in minimizing Z. (One will have to add $16 to Z.) Because the model’s current solution suggests not to produce X1, it will not be produced. However, if a constraint is added to this model such that X1 = 1, and then it is rerun, the solution would be forced to produce one unit of X1. In such a case, Z would be increased by $16, from the minimized value of $20 up to $36. The value of the other reduced cost value of 0 is expected because its related X2 is equal to a positive value (10).

C.4. Determining the Economic Value of a Resource with Duality The procedure for determining the economic value of a resource using duality is quite simple once the primal formulation and the dual solution are known. In the farming problem from Section C.3.1, the dual solution was given as follows:

To determine the economic contribution of each of the four resources (acres, labor hours, pigs, and sheep constraints), all that is needed is to multiply their marginal contribution coefficient by the actual number of units of each resource in the final solution. In the farming problem, this would be in order of each constraint: Total contribution of acres of land = 175 × $600 = $105,000 Total contribution of hours of labor = 6250 × $0 = 0 Total contribution of an extra pig = 150 × $0 = 0 Total contribution of an extra sheep = 200 × $50 = 10,000 Total maximized profit (Z) = $115,000 So, acres contribute most of the $115,000 maximized profit in this problem. Note that labor hours contribute nothing. How can this be when 6,250 hours of labor are used in the resulting optimal solution? It can be so because this instance looks only at the economic value of scarcity of resources, not an accounting value based on the actual cost of breeding animals. Because labor hours are slack, they are viewed as a free economic resource that does not impact the solution. And indeed, labor hours did not in any way determine the optimal values of the decision variables in this problem.

C.5. Duality Practice Problems Following are several practice duality problems, each followed by the answer. Use these problems to practice the methodologies and concepts presented in this appendix. 1. A small furniture manufacturer produces three different kinds of furniture: desks, chairs, and bookcases. The wooden materials have to

be cut properly by machines. In total, 100 machine-hours are available for cutting. Each unit of desks, chairs, and bookcases requires 0.8 machine hours, 0.5 machine hours, and 0.5 machine hours, respectively. This manufacturer also has 650 labor hours available for painting and polishing. Each unit of desks, chairs, and bookcases requires five labor hours, three labor hours, and three labor hours for painting and polishing, respectively. These products are to be stored in a warehouse, which has a total capacity of 1,260 sq. ft. The floor space required by these three products is nine sq. ft., six sq. ft., and nine sq. ft., respectively, per unit of each product. In the market, each product is sold at a profit of $30, $16, and $25 per unit, respectively. What is the dual solution of this problem, and what is its interpretation? (Answer: First, start with the formulation of the primal problem. This problem was taken from the Practice Problems in Appendix B.) Let X1, X2, and X3 be the number of units of desks, chairs, and bookcases to be produced, respectively. Because 100 total machine hours are available for cutting, the production of X1, X2, and X3 should utilize no more than the available machine hours. Therefore, the mathematical statement of the first constraint is in the form: 0.8X1 + 0.5X2 + 0.5X3 ≤ 100. Also, no more than 650 labor hours and 1,260 sq. ft. are available for painting, polishing, and storing, respectively. Therefore, these two constraints are in the form 5X1 + 3X2 + 3X3 ≤ 650 and 9X1 + 6X2 + 9X3 ≤ 1,260. Finally, the decision variables must be nonnegative. The complete problem formulation is as follows:

Answering the rest of this problem requires computer usage. What are the optimal values for primal and dual solutions, and what do they mean? Answer: The primal solution is where Z = 4,000, X1 = 100, X2 = 0, X3 = 40, s1 = 0, s2 = 30, and s3 = 0; the dual solution is where the dual prices are 16.667, 0, 1.852, and the dual reduced cost values are 0, 3.444, 0, respectively. What does this dual solution mean? The

marginal contribution of one machine hour is worth $16.667, labor hours are worth $0, and each square foot is worth $1.852. Also, if one decides to produce chairs (even though the primal solution says not to), it will cost $3.444 in profit per unit. 2. The Riverside Company wants to outsource the production of three products: premium zizs, deluxe zizs, and regular zizs. These three zizs can be produced at two different external plants with unique production capacities. In a normal day, Outsource Plant A produces 20 premium zizs, 30 deluxe zizs, and 100 regular zizs. Outsource Plant B produces 50 premium zizs, 50 deluxe zizs, and 60 regular zizs. The monthly demand for each of the zizs is known to be 5,000 units, 3,000 units, and 1,000 units, respectively. The company pays a daily cost of operation of $50,000 for Outsource Plant A and $50,000 for Outsource Plant B. What is the dual solution of this problem, and what does it mean? Answer: Again, formulate the primal problem. Let the decision variables X1 and X2 represent the number of days of operation in each of the plants. The objective function of this problem is the sum of the daily operational costs in the two different plants in the form: Minimize: Z = 50,000X1 + 50,000X2. The objective is to determine the value of the decision variables X1 and X2, which yield the minimum total cost subject to the constraints. The production of each of three different zizs must be at least greater-than or equal-to the specific quantity to meet the demand requirements. In no event should the production be less than the quantities of each product demanded. Together with the constraints, the problem can be formulated as follows:

This problem requires computer usage. What are the optimal values for the primal and dual solutions? Answer: Primal solution is: Z = 5,000,000, X1 = 0, X2 = 100, s1 = 0, s2 = 2,000, s3 = 5,000; dual price values 1,000, 0, 0, and dual reduced cost values, 30,000 and 0,

respectively. What does this dual solution mean? Each unit of premium zizs that must be produced will add $1,000 to costs but will add $0 to costs if one produces additional deluxe or regular zizs. Also, if the decision is to use Outsource Plant A, it will add $30,000 per day to the total costs Z.) 3. What are the relevant ranges for the cj and bi parameters from the printout in Figure 3.16? Answer:

D. Integer Programming D.1. Introduction D.1.1. What Is Integer Programming? In the prior chapters on linear programming (LP), the values of the decision variables were allowed to be any real number, which can include fractional or decimal values. Integer programming (IP), which can also be called integer linear programming, is a special case of LP in which the values of the n (number of) decision variables must be integers (0, 1, 2, and so on). This means that the formulation of the IP problem/model differs from the regular LP problem/model only in regard to the statement of the given requirements of the resulting solution. That is, we change the nonnegativity and given requirements from a set of real numbers: and X1, X2, . . . , Xn ≥ 0 to the all integer programming problem/model form: and X1, X2, . . . , Xn ≥ 0 and all integer It is possible to solve for a set of decision variable values that include both integer and noninteger (or real) values. This type of solution is called a mixed integer programming problem/model. In the mixed integer programming problem/model formulation, one would designate which decision variables will be integer and which will not. Consider a fourdecision variable problem in which real (noninteger) values are needed for decision variables X1 and X2, and integer values are needed for decision variables X3 and X4. The nonnegativity and given requirements in such a mixed integer IP model would be as follows: and X1, X2 ≥ 0; X3, X4 ≥ 0 and all integer Many business analytic problems require this additional integer solution. To optimize modeled problems dealing with assigning people or producing whole units of a product, new product selection, or project selection decisions, choose IP over LP. IP has application in the third step in the BA process: prescriptive analytics.

D.1.2. Zero-One IP Problems/Models Some IP problems require all integer solutions and some mixed integer solutions. Decision variables that must be integer can range from 0, 1, 2, . up to any integer number. That is one type of IP problem/model. In addition, there are other even more specialized IP models. One is called the zero-one programming (ZOP) model, which restricts the decision variable values to integer values of either zero or one. This changes the nonnegativity and given requirements to this: and X1, X2, . , Xn = 0 or 1 One might think this limitation is so restrictive that there cannot be much use for this model. Yet most day-to-day decision-making involves either a yes or a no. In most ZOP models, the decision variables are similarly used to model this decision as follows: Xj = 1 (a “yes” decision) Xj = 0 (a “no” decision)

D.2. Solving IP Problems/Models D.2.1. Introduction The branch-and-bound method is a solution procedure that can solve any IP problem for either all integer or mixed integer solutions using the regular simplex software. LINGO (from Lindo Systems, www.lindo.com) and Excel, which are based on the branch-and-bound method to run IP and ZOP problems, are used in this appendix.

D.2.2. A Maximization IP Problem Consider the following IP problem, which can be solved with the all integer solution using the branch-and-bound method:

LINGO requires integer variables to be identified with additional notation in the model. For all integer variables (and this permits mixed integer opportunities for noninteger variables), an additional designation is given in the expression here: @GIN (variable name);

Using LINGO to solve the maximization problem, the LINGO data entry information is presented in Figure D.1.

Figure D.1 A max IP problem’s data entry for LINGO The IP solution for this problem is presented in Figure D.2.

Figure D.2 A max IP solution from LINGO Excel has a similar adjustment to create an integer solution. By adding an additional INT constraint in Figure D.3 to the usual input of the two for an Excel LP model, the software generates an all integer solution, as presented in Figure D.4.

Figure D.3 A max IP problem data entry for Excel

Figure D.4 A max IP solution from Excel

D.2.3. A Minimization IP Problem Consider the following IP problem that can be solved for the all integer solution using the branch-and-bound method:

Using this model, one can enter the problem in LINGO as presented in Figure D.5 and obtain the solution in the printout in Figure D.6.

Figure D.5 A min IP problem data entry for LINGO

Figure D.6 A min IP solution from LINGO To allow one of the decision variables to be a real number to achieve a mixed integer solution, all that is needed is to remove the @GIN statement in the data entry for the model. The Excel solution for this problem is presented in Figure D.7.

Figure D.7 A min IP solution from Excel

D.3. Solving Zero-One Programming Problems/Models As previously stated, a ZOP model requires the values of its decision variables to be either zero or one. In the literature, there are enumeration methods used to solve ZOP model/problems, but these methods are beyond the scope of this course. LINGO and Excel are used to solve zero-one programming problems. To illustrate the solution process, consider a ZOP maximization model like this:

LINGO requires zero-one variables to be identified with additional notation in the model. For all zero-one variables (and this permits mixed zero-one variable capabilities), an additional designation is given in the expression here: @BIN (variable name); Using LINGO to solve the maximization problem, the LINGO data entry information is presented in Figure D.8, and the solution is presented in Figure D.9.

Figure D.8 A ZOP problem data entry for LINGO

Figure D.9 A ZOP solution from LINGO Excel also permits zero-one solutions using a BIN option instead of the previously shown INT integer option. It is implemented by creating the additional constraint, as shown in Figure D.10. The resulting zero-one solution is presented in Figure D.11.

Figure D.10 A ZOP problem data entry for Excel

Figure D.11 A ZOP solution from Excel

D.4. Integer Programming Practice Problems What follows are three practice problems, followed by their answers. Use these problems to practice the methodologies and concepts presented in this appendix. 1. (Answer requires use of computer.) What is the solution to this IP problem/model?

2. (Answer requires use of computer.) What is the solution to the IP problem/model that follows?

3. (Answer requires use of computer.) What is the solution to this ZeroOne problem/model?

E. Forecasting E.1. Introduction From the book Alice’s Adventures in Wonderland, there is an exchange between the Cheshire Cat and Alice. Alice asks, “Would you tell me, please, which way I ought to walk from here?” “That depends a good deal on where you want to get to,” said the Cat. “I don’t much care where.” said Alice. “Then it doesn’t matter which way you walk,” said the Cat. Business analytics (BA) help managers learn of opportunities and solutions to problems. Making BA work requires predicting the future, or at least trends into the future. Statistical and quantitative methodologies can be used to explore and aid in understanding basic relationships within data sets. These methods are at the heart of the second step in the BA process: predictive analytics. The purpose of this appendix is to introduce forecasting methodologies that are useful in exploring, conceptualizing, and predicting relationships within data. This appendix begins with a discussion of the types of variation that can be found in data and then presents a number of forecasting methodologies. The approach here will be to use SPSS and Excel software to perform the computations for the analytics.

E.2. Types of Variation in Time Series Data Forecasting in business is usually time related. When business data is matched to time periods, it is called time series data. Time series data can be used in a forecasting model to project future sales or product demand, where time is the predictive variable (also called the independent variable) represented in the appendix by the letter X. One might want to predict sales or product demand. In a time series model, the sales or product demand data is called the dependent variable and is represented in the forecasting models as the letter Y. Time series data can contain numerous unique variations that increase the complexity in forecasting and the resulting model error in predicting Y. This complexity is chiefly due to the types of variation that exist in the time series data. Four common types of variation can be present in time series data: trend, seasonal variation, cyclical variation, and random variation. These types of variation are presented graphically in Figure E.1. One or more of them are present in time series data. For some companies, sales are dominated by a single type of variation. As such, these firms need forecasting

methodologies that can accurately target the long-term (longer than one year) or short-term (less than one year) nature of these variations, as well as their linearity and nonlinearity.

Figure E.1 Types of variation in times series data There are two basic types of time series analysis models: an additive model and a multiplicative model. The additive time series model assumes that actual data is represented by an additive function, where: Yt = Tt + St + Ct + Rt and where:

Yt = the actual value in time period t Tt = the contribution of secular trend to the value of Y in time period t St = the contribution of seasonal variation to the value of Y in time period t Ct = the contribution of cyclical variation to the value of Y in time period t Rt = the random or residual contribution not explained by the other variance components in the value of Y in time period t The multiplicative time series model assumes that the contribution of each variance component is a compounding function of interrelated variance components, such that: Yt = (Tt)(St)(Ct)(Rt) Both the additive and the multiplicative models assume the presence, to a greater or lesser degree, of all four components of variation in time series data. The complex nature of these components of variation in time series data makes forecasting difficult in some situations. To cope with the presence of the multiple components of variation in the data, identify and analyze each component separately.

E.2.1. Trend Trend variation is a long-term linear change in an upward or downward movement of the data. Time series data that has this variation can be characterized by a gradual increasing or decreasing function over a long period of time, as presented in Figure E.1. A long-term linear forecasting method is needed to forecast this variation.

E.2.2. Seasonal Variation Seasonal variation consists of the short-term cyclical highs and lows of behavior during a period of a year or a season that repeats itself year to year. The product sales of recreation and tourism have definite periods of high and low activity during a year. As presented in Figure E.1, this variation is nonlinear and requires a short-term, nonlinear forecasting methodology.

E.2.3. Cyclical Variation Cyclical variation is a long-term version of seasonal variation. Cyclical variation is often described as the boom and bust periods for the economy of a country or an industry’s sales. As seen by the dividing arrows in Figure E.1, there is a typical four-period sequence that an economy cyclically goes through over a long period of time from a depression period (Years 1 to 2), to a recovery period (Years 3 to 5), to a prosperity period (Years 6 to 7), to a recession period (Years 8 or more). Businesses experience similar periods of sales that may or may not be related to general economic conditions.

E.2.4. Random Variation Random variation is the unexplained variation that remains (the residual variation) in time series data after the other types of variation (trend, seasonal, and cyclical) have been removed. All time series data has some kind of random variation. The more random variation that’s present in time series data, the more difficult it is to forecast. Indeed, if random variation is the most dominant type of variation in time series data (dominant over trend, seasonal and cyclical variation), forecasting would nearly be impossible.

E.2.5. Forecasting Methods Time series data can have all four types of time series variation or only one or two components. Determining the impact of these different types of variation in forecasting requires the use of several forecasting methodologies. Some methodologies are based on linear methods (for trend variation determination), and other methodologies use nonlinear methods (for seasonal and cyclical variation determination). These forecasting methodologies can identify the different types of variation and allow you to make forecasts. Forecasting models are most useful when time series data has little or no random variation. Unfortunately, data does not always oblige with nicely linear or nonlinear data. If a pattern of variation cannot be easily discerned from time series data, then a more complex forecasting model should be employed. Some of these models will briefly be discussed in the sections that follow. If a trend, seasonal, or cyclical variation pattern in the time series data can be discerned, one might want to use one of several forecasting methods, including a simple regression model (to forecast linear trend), a multiple regression model (to forecast linear trend), or an exponential smoothing model (to forecast nonlinear seasonal or cyclical

variation). In addition, other software modeling techniques can be employed to fit a model to actual data in such a way that it is useful in forecasting.

E.3. Simple Regression Model E.3.1. Model for Trend Regression (the process) allows the creation of a linear model that can be used to express a linear trend and can be used for a short- or long-term forecasting. Basically, this mathematical process averages the data points to a linear expression by minimizing the distance between the data points and the line. The result is a linear model. A simple regression model (one independent variable model is considered a simple model) can be employed to project a trend. The simple regression model seeks to regress data points (X and related Y points) to a single, linear expression. In a simple regression model, convert raw sales data (the Y variable) over time (the X variable) into a linear equality such that: Yp = a + bX where: Yp = the forecasted or predicted value of the dependent variable of trend a = vertical axis intercept value b = slope of the trend line (denoting direction and rate of trend) X = the independent variable, usually time in years or units of time for trend The model parameter b provides the direction and rate of trend for each period of time (X) used in the model. If the b is positive, Y is positively related to time or X (as X goes up in value, Y will go up). If the b has a negative sign, Y is negatively related to X (as X goes up, Y goes down). Because there will be no manual computing of the values of a and b in the simple regression model above, the formulas will be omitted. To use the model for predicting trend, the b slope value can be observed as being either positive (a positive, increasing trend into the future) or negative (a negative, decreasing trend into the future). Using the model for forecasting only requires selecting a time period in the future (X) and plugging it into the model to generate a forecast value (Yp) for that particular time period.

E.3.2. Computer-Based Solution Assume a company wants to develop a model that will reveal its sales trend and be useful in forecasting or predicting sales. For now we will limit discussion to just the predictive variable, Time, and dependent variable, Sales. We will assume that there are 20 months of sequentially related actual sales presented Figure E.2 on which to develop a model.

Figure E.2 Sales and other data for forecasting model development The data from Figure E.2 can be entered using the Regression option of Excel’s Data Analysis add-in. The resulting simple regression model is computed, along with other useful statistics as presented in Figure E.3. Additionally, charts can be provided to conceptualize the spread of actual data around the predicted linear regression model, as shown in Figure E.4.

Figure E.3 Excel simple regression model statistics

Figure E.4 Excel chart of regression model and actual sales data SPSS also provides similar regression model statistics, as shown in Figure E.5.

Figure E.5 SPSS simple regression model statistics Based on the SPSS printout (the same as the Excel printout), this sales problem simple regression model can be read as follows: Yp = 13280.689 + 279.525X where: Yp = the forecasted or predicted sales value for whatever time period (for example, one month) X = any time period (any month) in the future from which a forecast is desired

Because the value of b is positive (279.525), the slope is positive, and that indicates an increasing function for sales into the future. By plugging the monthly time period values of 21 and 25 into the simple regression model, forecast values can be generated that predict sales into the future for these two time periods. So, one can predict sales of $19,150.714 (13,280.689 + 5,870.025) in the twenty-first time period using this model, or one can look further into the future and predict sales of $20,268.814 (13280.689 + 6988.125) in the twenty-fifth time period. These are estimated average values, not exact ones, because it is known from the variation in Figure E.4 that the simple regression line only approximates the possible trend into the future.

E.3.3. Interpreting the Computer-Based Solution and Forecasting Statistics To use simple regression in trend analysis and forecasting, there are some assumptions that must hold true. Some of these assumptions or rules include the following: • There is a causal relationship between the variables X and Y. • For every value of X, there is a distribution of Y’s that allows for regressing the value of Y for predictive purposes. • The distributions of Y are normally distributed. • The relationship is linear. • As the values of X fall outside the range of the value of X’s used to develop the model, the accuracy of the model will increasingly be in error. To support the use of the information from the simple regression model, there are statistical values and tests that can aid our interpretation of the information. Looking at the SPSS printout in Figure E.5 (Excel provides much of the same information), one can see in the Correlations table that the Pearson correlation is both positive and high (.731 is closer to 1 than 0). This means that as the variable Time increases, so does the forecast value for the dependent variable Sales. Also, the Sig. (1-tailed) test with a p = 0.000 confirms that the relationship between Time and Sales is statistically significant. In the Model Summary table, this correlation is further confirmed with RSquare and Adjusted R-Square statistics. (See Appendix A, “Statistical Tools,” and Chapter 5, “What Are Descriptive Analytics?” for additional

information on the statistical testing mentioned here.) The ANOVA F-test (actual values found in the ANOVA table) compares the linear line values with the actual values (measures the variation from the actual values to the regressed linear line). The Sig. F Change column in the Modeling Summary and the Sig. column in the ANOVA table imply there is a significant relationship between the Time and Sales variables such that Time can significantly predict Sales. The Coefficients table in Figure E.5 is where the a and b coefficients of the simple regression model are located. In addition, t-tests are presented (in columns t and Sig.) that confirm in this case that both the a and the b coefficients are statistically significant. Also, the 95.0% Confidence Interval for the B column has useful confidence interval statistics (see Chapter 5). It is expected that 95% of the original data of 20 sales values fall between the boundaries in Sales of 11,734.226 and 14,827.153.

E.4. Multiple Regression Models One of the powerful statistical tools used in forecasting is multiple regression. We introduce this methodology in this section, provide a simple illustrative example, and explain some of the limitations on its use.

E.4.1. Introduction Multiple regression is used to develop a model when multiple independent variables might predict a dependent variable more accurately than the one independent variable simple regression model. It is not limited to time-series data but can be used to generate time series forecasts. It is ideal for sorting through possible predictive variables and determining those that should be used and those that should not be used in a forecasting model. The generalized model for a multiple regression model can be presented as such: Yp = a + b1 X1 + b2 X2 +. + bn Xn where: Yp= the forecasted or predicted value of the dependent variables of trend a = vertical axis intercept value Xi = (for i = 1,2, . n) different independent variables bi = (for i=1,2, . n) the proportional contribution of the related independent variable to the forecast of Yp The selection of the n different variables comes from an extensive research of possible predictive variables. These variables can be any

collection or collections of data that have an observable or assumed relationship with Y.

E.4.2. Application To illustrate the use of multiple regression, revise Sales data in Figure E.2, and this time include the additional Sun Spots variable. As previously stated for simple regression, assume a causal relationship, but in selecting Sun Spots, it is not likely to be related to the Sales variable. Regardless, both variables can be put into the multiple regression model to develop a linear regression model. The SPSS model, which is similar to the Excel model, is presented in Figure E.6.

Figure E.6 SPSS multiple regression printout for sales problem

The multiple regression model coefficients can be found in the B column of the Coefficients table presented in Figure E.6. The resulting multiple regression model can be taken from the printout as: Yp = 13369.037 + 275.321 X1 – 2.578 X2 It is important to contrast the SPSS printout in Figure E.5 for the single variable model when reviewing Figure E.6 to understand the potential impact of the statistics. We can see in the Correlations table, column Sales, that the Pearson correlation is the same for the Time variable (.731), but the new Sun Spots variable is only –.314. Although Time is still statistically significant at a p = 0.000, the Sun Spots variable with a p = 0.089 is not statistically significant at a cutoff value of p = 0.05. In the Model Summary table, the correlation coefficients are presented for the model as a whole. Interestingly enough, in multiple regression, the addition of variables, even poor predictive variables, can increase the overall model’s correlational values. In this sales problem, the R-Square and Adjusted R-Square statistics are statistically significant. However, the slight increases in those statistics from the single variable model are countered by a reduction in the Adjusted R-Square (down from .509 to .481) and an increase in the Std. Error of the Estimate from 1,584.582 to 1,629.394. More error is never a good thing in forecasting. Note also that the Sig. F Change test value is now larger than 0.001, whereas with only the one variable model, Time was more significant at 0.000. An increase in the variance that the variable Sun Spots brought into the model caused the difference in the Ftest. The Coefficients table in Figure E.6 is where the a, b1, and b2 coefficients of the simple regression model are. In addition, t-tests presented (in columns t and Sig.) confirm in this case that both a and b1 coefficients are statistically significant. The Sun Spots b2 coefficient is not statistically significant. As a result, it is highly suggested that the Sun Spots variable be excluded from the model because it will bring greater variance into the forecasting and prediction efforts.

E.4.3. Limitations on the Use of Multiple Regression Models in Forecasting Time Series Data The use of multiple regression models in time series forecasting has limitations. The application of the model to look beyond the ranges of its independent variables (like time) can violate the model’s necessary assumptions previously stated for the simple regression model. Moreover, there are numerous other mathematical conditions that make forecasting with multiple independent variables risky at best. There are lag effects between independent variables that can falsely lead researchers to assume they have a fairly accurate model by bloating the correlation coefficients, when in fact the independent variables may only be correlating between themselves, not the dependent variable they seek to forecast. A common statistical test used in regression analysis is called the DurbinWatson Autocorrelation Test, which tests for the lagged cause-and-effect relationship between the variables. It measures the residual errors when comparing forecast values with actual values. Ideally, there should be no autocorrelation present. If there is autocorrelation present in the model, the relationship between the variables in the model is not accurately expressed. The Durbin-Watson d-test computes a statistic d value (similar to a t-test or Z-test) that can range from 0 to 4. The closer the value is to 2, the less residual correlations (less autocorrelation) are assumed. The closer the value is to 0, the stronger the degree of positive correlations of the residuals. The closer the d-test statistic is to 4, the stronger the negative correlation of the residuals. As can be seen in Figure E.6, the Durbin-Watson d-test statistic is 2.837 for the multiple regression model. That suggests a slight negative correlation of residuals, but not much autocorrelation. The Durbin-Watson is just one of many tests that can be run to lend validity to the use of multiple regression.

E.5. Simple Exponential Smoothing E.5.1. Introduction Seasonal and cyclical variations are nonlinear functions of variation. To forecast data that is dominated by these nonlinear functions, a nonlinear forecasting methodology is needed. One such forecasting methodology is exponential smoothing. An exponential smoothing model, like the name implies, smooths raw data to reveal nonlinear behavior. This smoothing is accomplished by

mathematically weighting the inaccuracy of a prior forecast in an effort to generate a new forecast. In other words, exponential smoothing models allow the mathematical weighting of a prior inaccurate forecast in an effort to seek to improve and make a more accurate forecast in the future. Because the model is limited to forecasting only one time period into the future, it is viewed as a short-term forecasting model and can be useful in identifying cyclical and seasonal behavior in data. The formula for the simple exponential forecasting model follows: Ft = Ft -1 + α (At −1 − Ft − 1) where: Ft = exponential smoothed forecast value for the t time period Ft − 1 = forecast value for the t − 1 prior time period At − 1= actual value for the t − 1 prior time period α = an alpha weight ranging from 0 to 1 The values of F t − 1 and At − 1 for the first forecast value of F1, (F0 and A0, which are found by F1 − 1 and A1 − 1) are usually assumed values or an average of some prior set of data. The effect of these assumed F0 and A0 values will eventually be averaged out by this forecasting model, so the selection can be arbitrary if the number of forecast values n is significantly large. The number of periods t to use in a model at one time, as well as the value of α in the formula, is experimental and must be determined by selecting the best combination that minimizes forecasting error. This is usually accomplished by trial and error methods, where various values for the parameters are substituted into the model and results simulated, whereby a comparison can be made of accuracy statistics (see Section E.8) to find the best α to use in the formula or the number of time periods to run before the desired forecast value can be assumed to be generated.

E.5.2. An Example of Exponential Smoothing Given the Time and Sales data presented in Figure E.7, one can use exponential smoothing to reveal the nonlinear cyclical (if Time is in years) or seasonal (if Time is in months or weeks) variation in the data. Using a larger α takes a larger amount of the prior actual sales value, resulting in little smoothing. (Note chart in Figure E.7.) Alternatively, using a smaller α provides a more smoothed function of the Sales variable values and reveals two nonlinear cycles in the data rather clearly. Note also that it appears the general trend is upward for the α = 0.1 function.

Figure E.7 Exponential smoothing sales problem One can also use the exponential smoothing results to compute a forecast value one time period out. By plugging in the last forecast value and the actual value for the twentieth time period, the F21 forecast can be derived as follows: Ft = F t − 1 + α (At − 1 − Ft − 1) F21 = F 20 + 0.1(A20 − F20) F21 = 15707.71 + 0.1(19864 − 15707.71) F21 = 16123.339

E.6. Smoothing Averages E.6.1. Introduction A collection of averaging methods is available to forecasters to deal with the type of nonlinear data common in seasonal and cyclical variation. Some of these averaging methods include weighted moving averages. These methods seek to smooth out variations present in the data to reveal the nonlinear behavior. The forecasting model formula for a weighted moving average follows: = (w1)Yt − 1 + (w2)Yt − 2 + . + (wk)Yt − k where:

= the forecast value in time period t Yt − 1 = the actual value in the time period just prior to time period t Yt − 2 = the actual value of two time periods prior to time period t k = the number of values to average at one time wi = mathematical weights such that the sum of the weights equals one If the wi weights are all equal, the weighted moving average becomes a moving average by simply moving one t time period at a time. Question: Given the following sales data, what is the forecast of sales for time period 5 using a two-value (k) moving average with equal weights of 0.5?

Answer: For the two-value average, one needs only the last two sales for time periods 3 and 4: = (w1)Yt − 1 + (w2)Yt − 2 = (w1)Y4 + (w2)Y3 = (0.5)78 + (0.5)67 = 72.5

E.6.2. An Application of Moving Average Smoothing Returning to the sales problem data in Figure E.2, two-value moving averages and five-value moving averages (this assumes equal weighting in the smoothing model) can be computed. The Excel printout for these two smoothing functions is presented in Figure E.8.

Figure E.8 Moving average smoothing for two- and five-value averages Note in Figure E.8 how much different the five-value moving average is from the two-value. The five-value moving average provides the smoothest function, but it also moves the model function away from the time periods when the actual sales originate. The greater the k, the greater is the movement from the actual time period. This is a cost of using a smoothing methodology. Yet, it does help to recognize potential nonlinear cyclical or seasonal variations much better than just looking at the raw data. Indeed, this kind of methodology can be used in the descriptive analytics step of the business analytics process, while also having value in identifying important behavior for the predictive analytics step.

E.7. Fitting Models to Data One of the many computer-based features that SPSS offers is a modelfitting function called Curve Estimation. The function is located in SPSS in ANALYZE>REGRESSION>CURVE ESTIMATION. A more limited, but comparable, methodology can be found in Excel’s Trendline tool. This function permits data of any kind, including time series data, to be fit by the software to various models in differing mathematical expressions. It utilizes regression modeling to do the fitting of the data to a collection of potential models, where each has unique mathematical characteristics. This feature permits both linear and nonlinear functions to be regressed. Also, it

permits users to use this feature to detect all types of time series variations and develop models to help predict them. In Figure E.9, the data entry window allows the selection of 11 different mathematical expressions to be fitted with the variable data (in this case, the Sales and Time data). In addition to fitting the data to a particular mathematical expression, it provides statistical testing information on the model’s usefulness in judging accuracy. Each of the 11 mathematical expressions requested received the same type of statistical information on which the best model can be selected. For information on the structure and definitions of these 11 functions, see the SPSS Help window (/help/index.jsp?topic=/com.ibm.spss.statistics.help/overvw_auto_0.htm).

Figure E.9 SPSS curve-fitting data entry window In more complex models, the statistics are adjusted to that particular type of mathematical expression, such as a quadratic regression model printout included in the tables in Figure E.10 and the chart in Figure E.11. The quadratic regression model can be found from Figure E.10 to be this: Yp = a + b1 X + b2 X2 = 12783.181 + 415.209X − 6.461X2

Figure E.10 SPSS quadratic regression model printout Just as it was illustrated in simple regression, this model and any of the 11 others can be used to forecast or predict future sales. Note in the chart in Figure E.11 that none of the models include all data points, but each can be examined in light of its t-test and F-test statistics to determine the best forecasting model.

Figure E.11 SPSS chart of 11 models

E.8. How to Select Models and Parameters for Models The selection of a model to forecast or a parameter in a model (for example, alpha or an independent variable in multiple regression) can be based on a number of criteria. The type of variation (linear or nonlinear) is one criterion. Other criteria include cost to develop a model, time it takes to develop a forecasting model, and time horizon of the forecast (long-term or short-term). The single, most important criterion for making a final selection of a model or a parameter in a model is forecasting accuracy. Although statistical methods like correlation and t-tests provide some measure of variable relationships and their potential to predict values, in forecasting, actual results are vital. Accuracy statistics can help make this selection decision. The most accurate model will be the one that generates the least forecasting error. Several statistics (MAD, MSE, and MAPE statistics) can be computed for any model once it has been developed. In this way, differing models can

be compared, and parameters can be selected for use. Following are the formulas for these commonly used forecast accuracy statistics. Experienced forecasters often use a simpler statistic called the mean absolute deviation (MAD). The formula for MAD follows:

where: At = actual value in time period t Ft = forecast value for time period t n = total number of t time periods that are being summed in the numerator The MAD statistic will be zero if the predictive model used to generate Ft perfectly predicts At. As the error in forecasting increases, so will the MAD statistic value. When comparing the MADs from different models or forecasts based on differing parameters in a model, the smaller the MAD, the more accurate is the model. A similar statistic that seeks to minimize error in forecasting is the minimizing mean square error (MSE), using the same principles of standard error. Here’s the formula for MSE:

where: At = actual value in time period t Ft = forecast value for time period t n = total number of t time periods that are being summed in the numerator Like the MAD statistic, the smaller the MSE, the more accurate the use of the parameter or model. Another useful error metric is mean absolute percentage error (MAPE). Mean square error and mean absolute error get larger with more observations and need to be compared with other measures of the same type. MAPE has the relative advantage in that it presents error in percentage form, making it possible to learn something about relative error immediately. Here’s the formula for MAPE:

E.9. Forecasting Practice Problems Following are some practice forecasting problems, followed by the answers. Some problems can be solved by manual computation, whereas others require a computer. Use these problems to practice the methodologies and concepts presented in this appendix. 1. (Answer requires use of computer.) A company has had an annual demand of 120, 124, 127, 134, and 145 units, respectively, for the past five years. Using an alpha of 0.2, what is the forecast value for the next year? (Answer: Forecast of 6th period = 128.546) 2. (Answer requires use of computer.) A company has had an annual demand of 120, 124, 127, 134, and 145 units, respectively, for the past five years. Suppose the company wants to develop a forecasting model based on two predictive variables: Time and Index. The Time values are 1, 2, 3, 4, and 5, respectively. The Index values are 120, 135, 148, 158, and 169, respectively. What is the resulting multiple regression model? (Answer: Yp = 256.0505 + 21.8889X1 − 1.3131X2) 3. Three models have been used to generate a forecast. Model 1’s forecast has a resulting correlation coefficient of 0.79, Model 2’s forecast has a resulting correlation coefficient of 0.37, and Model 3’s forecast has a resulting correlation coefficient of 0.89. Which model is the best forecasting model? (Answer: Model 3 has the largest correlation coefficient. Without any other supportive statistics, it appears to be the best.) 4. Suppose sales have been calculated as a dependent variable in a regression model with Index numbers as the independent variable such that the model is Yp = –138.9045 + 2.0201X. Now suppose it is found that next month’s Index value is going to be X = 90. What is the predicted value of Yp? (Answer: Yp = –138.9045 + 2.0201(90), or 42.9045.)

F. Simulation F.1. Introduction Mathematical models that are used to model probabilistic functions can become extremely difficult to solve. To avoid the complications and limiting assumptions of models like linear programming, simulation can be used to obtain a solution. Once a simulation model is developed and validated, it can be used to answer what-if questions. In the role of business analytics (BA), simulation can predict future events and payoffs. Simulations can also permit changes to systems without risk to an actual system. For example, one can assume a 5, 10, or 15 percent increase in costs in a pro forma income statement to simulate and predict the impact on profits without any risk to the organization.

F.2. Types of Simulation Simulations can be categorized into two types: deterministic and probabilistic.

F.2.1. Deterministic Simulation A deterministic simulation involves the use of incremental change in a parameter for a predefined model or set of equations. For example, suppose one is interested in seeing the impact on the Breakeven in Units by changing the parameter Price in the breakeven model that follows: Breakeven in Units = Total Fixed Cost / (Price − Variable Cost) Set the value for Total Fixed Cost equal to $3,000 and Variable Cost equal to $5. Vary the possible Price values to $6, $7, or $8 using the preceding equation. The Price parameter changes represent the incremental change of a single parameter in the breakeven formula. The resulting simulated changes in Breakeven in Units follow: Breakeven in Units (for Price = $6) = 3,000 / (6 − 5) = 3,000 units Breakeven in Units (for Price = $7) = 3,000 / (7 − 5) = 1,500 units Breakeven in Units (for Price = $8) = 3,000 / (8 − 5) = 1,000 units The three breakeven values represent deterministic simulated values. They can be used in BA to explore the impact of the three pricing scenarios.

F.2.2. Probabilistic Simulation A probabilistic simulation occurs when we allow one or more parameters to behave in accordance with a probabilistic distribution. The Monte Carlo Simulation Method is a probabilistic simulation method. It is particularly useful in that this method does not require a specific type of probability distribution to be identified to generate a solution. Many of the advanced simulation software systems today require the identification of a specific probability distribution, like those described in Appendix A, “Statistical Tools.” Many of the state-of-the-art simulators used in games and training systems are based on the Monte Carlo simulation method. F.2.2.1. Monte Carle Simulation Method Procedure Like all mathematical modeling approaches, the Monte Carlo simulation method requires several steps: 1. Express system behavior as mathematical expressions, and determine the rules and assumptions under which the simulation will be run and what will determine the system’s success or failure —A mathematical expression might be a cost function. A rule might involve charging to carry stock in inventory. An assumption might be to limit the system behavior to a fixed period of time. The system’s total cost might determine success or failure. 2. Collect probability distribution information—At least one parameter has to behave in accordance with a probability distribution. There may be dozens of parameters and probability distributions on which to collect data so that the distributions can be modeled into the simulation. 3. Express probability distribution of each parameter in terms of a discrete distribution—What needs to be done here is identify parameter behavior that is representative of the distributions collected. For example, sales data collected might range from $0 to $18, and one might choose to place the data into intervals such as $0 to $9 and $10 to $18, which can be called parameter behavior. One could then attach the observed probability to these intervals, as presented in Table F.1.

Table F.1 Paring of Parameter Behavior and Probabilities The establishment of the m intervals allows the distribution of each parameter behavior to form a more identifiable probability distribution. 4. Establish random number assignment system—To establish the random number assignment system, take Table F.1 and add the numbering system to it, as presented and illustrated in Table F.2.

Table F.2 Monte Carlo Numbering System and Illustration This is the heart of the Monte Carlo simulation method. The random number system can, as presented here, be a two-digit system ranging from 00 to 99. The idea is to permit a spread of 100 digits that can be proportioned to 100 percent of the Probability of Behavior column. In the example in Table F.2, there are three intervals of parameter behavior (column 1 in the table). Consider these daily sales that are observed and that one wants to simulate. In column 2 (Prob. of Behavior), a list of the observed frequency of the daily sales is expressed as a decimal. These probabilities have to add to one. In column 3 (Cumulative Prob. of Behavior), the probabilities from

column 2 are added together going down from the first interval. Finally, in column 4 (Random Number System), the digits are allocated in exact proportion to the probability of the behavior starting with 00 and ending with 99. For example, there are 15 digits in the interval between 00 and 14 representing the probability of 0.15 for the Parameter Behavior interval of sales of $0 to $10. Note how the Cumulative Probability determines the upper value in the Random Number System by subtracting one from the cumulative probability (15 − 1 = 14, 35 − 1 = 34, and 100 − 1 = 99). This numbering system simulates behavior. 5. Determine sample size for the simulation run—Sample size can be determined in many ways. In some cases, it can be determined by time (for example, only simulate one year’s worth of behavior). However, more complex statistical techniques can be used that permit statistical confidence to be included. 6. Run the simulation, compute the desired statistics, and make decisions—Simulations are run using computer software. In the Monte Carlo simulation method, a simulation is performed by randomly selecting a number between 00 and 99 and then determining the interval of the parameter behavior where the random number strikes. The statistics that are to be collected are usually defined in Step 1, as are the criteria on which decisions are to be based. F.2.2.2. A Monte Carlo Simulation Application Suppose that a company wants to determine which of two production policies should be used to set its monthly production rate. The two policies from which the company will select follow: • Policy 1—Fixed monthly production rate of 100 units • Policy 2—Flexible monthly production rates, in which next month’s production rate is equal to last month’s actual demand The best policy is the one that will generate the least total shortage and carrying costs over a fixed period of ten months. To conduct this simulation, use the six-step Monte Carlo simulation method. Please note the following steps in this problem: 1. Mathematical expressions, rules, and assumptions of the simulation —The company collected the following rules, assumptions, and cost information: a. The number of units demanded that cannot be satisfied from monthly production is considered inventory shortage. A subcontractor

charges $10 per unit for inventory shortage. b. The number of units produced in excess of demand or units carried over from the past month and that are not used in the current month are considered carried inventory units. Carried inventory from one month to the next month costs $2 per unit, per month. c. Ten months of demand will be simulated. d. Total Cost = Shortage Cost + Carrying Cost. e. Minimum Total Cost of policy over ten months determines best policy. f. Units carried from one month must be added to the next month’s supply. g. Under Policy 2, the first month of production will be arbitrarily set at 100 units. 2. Probability information—The company has five possible demand levels. They are 80 units per month, 90 units, 100 units, 110 units, or 120 units. The respective probable occurrences observed when collecting data on the frequency of occurrences follow: 15 percent chance of 80 units of demand, 20 percent chance of 90 units of demand, 25 percent chance of 100 units of demand, 25 percent of 110 units of demand, and 15 percent chance of 120 units of demand. 3./4. Probability distribution and random number assignment—The table with the random number assignment schedule is presented in Table F.3.

Table F.3 Probabilities and Random Number Assignments 5. Sample Size: Given as 10 months in Step 1. 6. Simulated behavior and statistics of both policies: Both SPSS and Excel provide helpful statistical support to run simulations. Using Excel’s Random Number Function Generator, one is able to load the probability distribution and data from Table F.3 and allow Excel to simulate demand values that can be observed in Figure F.1. Note in this

exhibit that a discrete probability distribution is used because the outcomes are discrete values (though we also incorporate the probability function). If desired, Excel can model the other probability distributions listed in Appendix A in place of the discrete distribution in this example.

Figure F.1 Excel simulated demand The simulated results for Policy 1 are presented in Table F.4, and Policy 2’s results are in Table F.5. Under Policy 1, a fixed production rate of 100 units per month is going to be produced to meet demand. We can see in the Production column that the 100 units are listed for all 10 months. See that the random number of 52 falls in the Random Number System interval (see Table F.3) of 35 to 59. This interval is related to the Parameter Behavior column in Table F.3 of a demand of 100 units. The first demand of 100 units has been simulated. Because Policy 1 has a fixed production rate of 100 units, and the simulated demand is 100 units, there are no Shortage Costs or Carrying Costs. This means there are 0 units, adding $0 to cost. In Month 2, a random number of 80 is drawn, which falls between the Random Number System interval of 60 to 84. That interval is associated with a demand of 110 units. Because the fixed production rate is only 100 units, the demand of 110 results in a shortage of 10 units, or a cost of $100 (10 units × $10). This process of drawing a random number, checking the interval, determining the demand,

and calculating the costs is repeated for all 10 months. The resulting Total Costs for Policy 1 are $480. Now looking at the cost of Policy 2 in Table F.5, note that the demand in Month 1 is what the production rate was set at in Month 2, and the demand in Month 2 is the production in Month 3, and so on. Using the same random numbers for Policy 2 that were used for Policy 1, the resulting Total Costs are $340. Because the Total Costs for Policy 2 are less than Policy 1, select Policy 2 for the operation.

Table F.4 Resulting Simulation Policy 1 Results

Table F.5 Resulting Simulation Policy 2 Results In this Monte Carlo simulation problem, only one parameter had a probability distribution. In most realistic simulation problems, many parameters are simultaneously simulated to capture the dynamics of system behavior. Modeling these types of problems requires computer software systems that specialize in simulation. F.2.2.3. Comment on Computer Simulation Methods Many software systems support any sized problem. The illustration of Excel here is meant only to provide a rudimentary idea of how simulation methods can work. SPSS has a powerful simulation function that permits Big Data usage in both deterministic and probabilistic simulation models. The illustration of this function requires considerable programming of databases, rules, and equations, which is beyond the scope of this book.

F.3. Simulation Practice Problems Following are a couple of conceptual practice simulation problems, followed by their answers. Use these problems to practice the methodologies and concepts presented in this appendix. 1. A company has a service demand rate of 20 units 10 percent of the time, 30 units 40 percent of the time, and 40 units 50 percent of the time. Using the following random numbers (19, 45, 84, 5, 99), simulate five demand periods. Answer: 30, 30, 40, 20, and 40, respectively.

2. If there are four intervals in a simulation problem of 0 to 5, 6 to 10, 11 to 15, and 16 to 20 with related probabilities of 15 percent, 25 percent, 30 percent, and 30 percent, respectively, what “random number system” can be used to conduct the Monte Carlo simulation? Answer: 00 to 14, 15 to 39, 40 to 69, and 70 to 99.

G. Decision Theory G.1. Introduction Decision analysis involves a variety of methodologies that can be based on heuristics, principles, and optimization methodologies, all of which can aid decision-making. One common body of knowledge associated with decision analysis is referred to as decision theory. Decision theory (DT) is a field of study that applies mathematical and statistical methodologies to provide information on which decisions can be made. DT does not solve for optimal solutions like linear programming but instead is based on decisionmaker preferences and principles to select choices and better satisfy needs, particularly for problem-solving environments. Before using these DT methodologies, one must know the elements of the DT model to identify and correctly formulate the problem. The solution methodologies presented in this appendix are mathematically simple and can easily be rendered using Excel or SPSS. Simple spreadsheets can be used to generate the prescriptive analytic information from the models presented here.

G.2. Decision Theory Model Elements There are three primary elements in all DT problems: alternatives, states of nature, and payoffs. 1. Decision alternatives or strategies—The independent decision variables in the DT model that represent the alternative strategies or choices of action from which only one may be selected. 2. States of nature—Independent events that are assumed to occur in the future, such as an economic recession. 3. Payoffs—Dependent parameters that are assumed to occur if a particular alternative is selected and a particular state of nature occurs, such as improved business performance. These three primary elements are combined into a payoff table to formulate the DT model. The general statement of a DT model is presented in Table G.1, where there are m alternatives and n states of nature. The idea here is that there can be a different number of alternatives than states of nature (so m does not have to equal n) and Pij (where i = 1, 2, . m; j = 1, 2, . n) payoff values.

Table G.1 Generalized Statement of the DT Model

G.3. Types of Decision Environments There are three primary types of DT environments: certainty, risk, and uncertainty. 1. Certainty—Under this environment, the decision maker knows clearly what the alternatives are to choose from and the payoffs that each choice will bring. 2. Risk—Under this environment, some information on the likelihood of states of nature occurring is available but presented in a probabilistic fashion. 3. Uncertainty—Under this environment, no information about the likelihood of states of nature occurring is available.

G.4. Decision Theory Formulation The procedure for formulation of a DT model consists of the following general steps: 1. Identify and list as rows the alternatives to choose from. 2. Identify and list as columns the states of nature that can occur. 3. Identify and list the payoffs in the appropriate row and column. 4. Formulate the model as a payoff table. Using this procedure, consider the following DT problem. Suppose one wants to decide between two types of promotion efforts: A or B. The payoffs depend on the states of nature. In this problem, there are two states of nature: High Demand and Low Demand. If the selection is promotion strategy A, and one experiences a High Demand condition, the payoff will be $3 million in sales. With a Low Demand state of nature, it will result in sales equal to $1 million. If the selection is promotion strategy B, and one experiences a High Demand condition, it will result in $4 million in sales. With a Low Demand

state of nature, it will result in a loss of $2 million in sales. What is the DT model formulation for this problem? Using the four-step DT procedure, formulate this model accordingly: 1. Identify and list as rows the alternatives to choose from. There are two alternatives (A and B) and only one can be chosen. 2. Identify and list as columns the states of nature that can occur. In this problem, there are two states of nature (High Demand and Low Demand), so this results in a 2-by-2 sized payoff table. 3. Identify and list the payoffs in the appropriate row and column. The payoffs are in sales: $3, $1, $4, and $–2 million. 4. Formulate the model as a payoff table. The payoff table formulation of the complete model is presented in Table G.2.

Table G.2 DT Formulation of the Promotion Selection Problem Once a DT is formulated, the payoff table can be used to analyze the payoffs and render a decision. The methodologies that are used to solve a DT problem vary by type of decision environment.

G.5. Decision-Making Under Certainty Many different criteria can be used to aid in making decisions when the decision maker knows with certainty what the payoffs are in a given state of nature. Two of these criteria are maximax and maximin.

G.5.1. Maximax Criterion The maximax criterion is an optimistic approach to decision-making. The maximax selection is based on the following steps: 1. Select the maximum payoff for each alternative. 2. Select the alternative with the maximum payoff of the maximum payoffs from Step 1. To illustrate this criterion, revisit the topology problem. The solution to this problem is presented in Table G.3. As can be seen, the maximum payoffs for each of the two alternatives are $3 million and $4 million in sales, respectively. Of these, the $4 million payoff is the maximum payoff, so the

max of the max is $4 million with the selection of choosing to build a Promotion B alternative.

Table G.3 Maximax Solution for DT Promotion Selection Problem

G.5.2. Maximin Criterion The maximin criterion is a semi-pessimistic approach that assumes the worst state of nature is going to occur, and one should make the best of it. The maximin selection is based on the following steps: 1. Select the minimum payoff for each alternative. 2. Select the alternative with the maximum payoff of the minimum payoffs from Step 1. To illustrate this criterion, again revisit the topology problem. The solution to this problem is presented in Table G.4. The minimum payoffs for each of the two alternatives are $1 million and $–2 million in sales, respectively. Of these, the $1 million payoff is the maximum payoff, so the max of the min is $1 million with the selection of the Promotion A alternative.

Table G.4 Maximin Solution for DT Promotion Selection Problem Note the differing answers earlier to the same problem, which might cause some concern. How can one criterion suggest one alternative and other criterion suggest still another alternative? Indeed, which alternative is the best? It depends on the selection of criteria that is chosen to guide decisions. An optimist would choose a maximax approach, and a pessimist would choose the maximin approach.

G.6. Decision-Making Under Risk Many criteria can aid in making decisions when the decision maker knows the problem faced has a risk environment where states of nature are probabilistic. In such a decision environment, both the origin of the probabilities and the criteria used to make a decision are important.

G.6.1. Origin of Probabilities In a risk problem, probabilities are attached to each state of nature. The sum of these probabilities must add to one. In Appendix A, “Statistical Tools,” a number of methodologies are presented to assess probabilities. Where probabilities come from can include objective or subjective sources. Objective source probabilities include experimental observation of history or using some statistical formula, such as probability distribution. When using objective methods to determine probabilities, assume the following: 1. The probability of past events will follow the same pattern in the future. 2. The probabilities are stable in the process that is being observed. 3. The sample size is adequate to represent the past behavior. If these assumptions are not valid, an alternative way of determining probabilities involves the use of subjective source probabilities. This involves having experts make their best guesses at what a probability should be for the states of nature. Using this approach to probability assessment requires one to assume the experts are knowledgeable of the behavior for which they are assessing probabilities, and that their judgments are reasonably accurate.

G.6.2. Expected Value Criterion Many criteria can be used to aid in making decisions in a risk environment. Two of these criteria are Expected Value and Expected Opportunity Loss. The expected value (EV) criterion is determined by computing a weighted estimate of payoffs for each alternative. The EV criterion is based on the following steps: 1. Attach the probabilities for each state of nature to the payoffs in each row in the payoff table. 2. Multiply the probability in decimal form by each payoff and sum by row. These values are the expected payoffs for each alternative. 3. Select the alternative with the best payoff. If the problem has profit or sales payoffs, the best payoff would be the largest expected payoff. If

the problem has cost payoffs, the best payoff would be the smallest expected payoff. To illustrate this criterion with the promotion selection problem, set the probability of High Demand at 40 percent and the probability of Low Demand at 60 percent. The probabilities attached to the states of nature change this problem into a risk-type decision environment. To compute the expected values, the probabilities in percentages are changed to decimal values and multiplied by their respective payoff values. The EVs of each alternative are presented in the last column of the payoff table in Table G.5. As can be seen, the best payoff (maximum expected profit) is with the promotion A strategy at $1.8 million.

Table G.5 Expected Value Solution for DT Promotion Selection Problem

G.6.3. Expected Opportunity Loss Criterion The expected opportunity loss criterion is based on the logic of the avoidance of loss. The decision using this criterion is based on minimizing the expected opportunity loss (what one stands to lose if the best decision for each state of nature is not selected). The procedure for computing the values on which this criterion is based involves the following steps: 1. Determine the opportunity loss values in not making the best decision in each state of nature. This is accomplished by selecting the best payoff under each state of nature and subtracting all the values in that column from that particular best payoff (including itself). The result of this difference is called opportunity loss. The opportunity loss values can be structured into an opportunity loss table represented by the same framework as the DT payoff table. 2. Attach the probabilities to the opportunity loss values, and compute expected opportunity loss values for each alternative by summing the products of the probabilities and their respective opportunity loss values. 3. Select the alternative with the minimum expected opportunity loss value computed in Step 2.

The steps to this criterion in solving the DT promotion selection problem are presented in Tables G.6 and G.7.

Table G.6 Step 1 of Expected Opportunity Loss Solution for DT Promotion Selection Problem

Table G.7 Steps 2 and 3 of Expected Opportunity Loss Solution for DT Promotion Selection Problem 1. Determine the opportunity loss values in not making the best decision in each state of nature. This is accomplished by selecting the best payoff under each state of nature and subtracting all the values in that column from that best payoff. The opportunity loss values can be structured into an opportunity loss table represented by the same framework as the DT payoff table in Table G.6. So, under the High Demand state of nature if the alternative Promotion B is selected, there will be “0” opportunity loss, since this is the best possible payoff in this state of nature. Alternatively, if Promotion A is selected, there will be an opportunity loss of $1 million in sales (i.e., $4 – $3 = $1 of loss), since with that alternative $4 million could have been made instead of just $3 million. 2. Attach the probabilities to the opportunity loss values and compute expected opportunity loss values for each alternative by summing the

products of the probabilities and their respective opportunity loss values. 3. Select the minimum expected opportunity loss value computed in Step 2. The minimum expected opportunity loss is with the promotion A alternative with a value of only $0.4 million.

G.7. Decision-Making under Uncertainty Decision-making under uncertainty means that the decision maker has no information at all on which the state of nature will occur. Although many different criteria can be used in this environment, consider the following five: Laplace, Maximin, Maximax, Hurwicz, and Minimax.

G.7.1. Laplace Criterion The Laplace criterion is based on the Principle of Insufficient Information. It is assumed under this principle that because no information is available on any state of nature, each is equally likely to occur. As such, one can assign an equal probability to each state of nature and then compute an expected value for each alternative. The Laplace selection is based on the following steps: 1. Attach an equal probability to each state of nature. 2. Compute an expected value for each alternative using the expected value criterion. 3. Select the alternative with the best expected value computed in Step 2. We can again illustrate this criterion by revisiting the promotion selection problem. The solution to this problem is presented in Table G.8.

Table G.8 Laplace Solution for DT Promotion Selection Problem 1. Attach an equal probability to each state of nature. Because there are two states of nature, the probability of each is 50 percent or 0.50. 2. Compute an expected value for each alternative. The expected value computations are as follows: Promotion A: 3(0.50) + 1(0.50) = $2 million Promotion B: 4(0.50) + (–2)(0.50) = $1 million

3. Select the alternative with the best expected value computed in Step 2. The best alternative is Promotion A at $2 million in sales.

G.7.2. Maximin Criterion The maximin criterion is the same as it was under certainty. The solution is the same as given before.

G.7.3. Maximax Criterion The maximax criterion is the same as it was under certainty. The solution is the same as given before.

G.7.4. Hurwicz Criterion The Hurwicz criterion uses the decision maker’s subjectively weighted degree of optimism of the future. The coefficient of optimism is used for this weighting. The coefficient of optimism is on a scale from 0 to 1 and is represented by the Greek letter alpha, or α. The closer alpha is to 1, the more optimistic the decision maker is about the future. The coefficient of pessimism is 1 − α. The Hurwicz selection is based on the following steps: 1. State the value of alpha or α. 2. Determine the maximum and minimum payoffs for each alternative. 3. Multiply the coefficient of optimism (α) by the maximum payoff, multiply the coefficient of pessimism (1 − α) by the minimum payoff, and add these values together to derive the expected value for each alternative. 4. Select the alternative with the best expected payoff from Step 3. To illustrate this criterion again, revisit the DT promotion selection problem. The solution to this problem is presented in Table G.9. 1. State the value of α. Let α = 0.7. This means one is more optimistic (closer to 1). 2. Determine the maximum and minimum payoffs for each alternative.

Table G.9 Hurwicz Solution to the DT Promotion Selection Problem 3. Multiply the coefficient of optimism (α) by the maximum payoff, multiply the coefficient of pessimism (1 − α) by the minimum payoff, and add these values together to derive the expected value for each alternative.

Promotion A: 3 (0.7) + 1 (1 − 0.7) = $2.4 million Promotion B: 4 (0.7) + (–2)(1 − 0.7) = $2.2 million 4. Select the best expected payoff from Step 3. The best payoff is with the Promotion A alternative at $2.4 million in sales.

G.7.5. Minimax Criterion The minimax criterion is similar to the expected opportunity loss criterion in that it is based on avoidance of loss. The decision using this criterion is based on minimizing the expected opportunity loss. The procedure for computing the values based on the minimax criterion consists of the following steps: 1. Determine the opportunity loss values in not making the best decision in each state of nature. This is accomplished by selecting the best payoff under each state of nature and subtracting all the values in that column from that particular best payoff. The opportunity loss values can be structured into an opportunity loss table represented by the same framework as the DT payoff table. 2. Determine the maximum opportunity loss values for each alternative. 3. Select the alternative with the minimum opportunity loss value determined in Step 2. The steps to this criterion in solving for the DT promotion selection problem are presented in Table G.10.

Table G.10 Minimax Solution of the DT Promotion Selection Problem 1. Determine the opportunity loss values in not making the best decision in each state of nature. This is accomplished by selecting the best payoff under each state of nature and subtracting all the values in that column from that particular best payoff. The opportunity loss values can be structured into an opportunity loss table represented by the same framework as the DT payoff table.

2. Determine the maximum opportunity loss values for each alternative.

3. Select the minimum opportunity loss value determined in Step 2. The minimum of the maximum opportunity loss values is with the Promotion A alternative with a payoff of $1 million in sales.

G.8. Expected Value of Perfect Information The expected value of perfect information (EVPI) is the difference between the expected value under a decision environment of certainty and the expected value under a decision environment of risk. In other words, if one knows exactly what state of nature will exist in the future, select the payoff maximizing action (with certainty), and then compare these optimal choices with the choices made using expected value analysis (under risk). The difference in these two values would be the EVPI. The value of EVPI is an upper boundary on what one is willing to pay for perfect information on the future states of nature. Consider the calculation of EVPI based on expected payoffs in a risk environment. By calculating the expected profits for a personal computer (PC) rental problem (as presented in Table G.11), one would select the strategy or action of making three PCs available for customers based on the maximum expected profit of $115. The expected profit of $115 represents the best decision under risk. The calculations of the expected values, however, consider all the possible event outcomes (having only one PC available, 2 PCs available, and so on). In Table G.11, the values with an asterisk (*) are the maximum profit payoffs for each event. If it was known with certainty which of the events would occur, one could select the actions that would maximize profit.

Table G.11 Expected Value Payoffs for Each Personal Computer Rental Action In Table G.12, the calculations of the profit under certainty are presented. The expected profit under certainty is $162. So to make the best decision for all possible outcomes, one can expect a maximum profit of $162 in rentals per day. The difference between the maximum expected profit under certainty and the maximum expected profit under risk is EVPI, or: EVPI = [Maximum Expected Payoff (under certainty)] − [Maximum Expected Payoff (under risk)] = $162 − 115 = $47

Table G.12 Expected Value under Best Action with Payoff Certainty So the value of obtaining perfect information in the PC rental problem is worth $47 per day.

G.9. Sequential Decisions and Decision Trees Some decision situations require a sequence of decisions to be made. Decisions that are dependent on one another in a sequence are called sequential decisions. One statistical methodology that is useful in understanding a sequential decision problem formulation is a decision tree. A decision tree is a graphical aid that can be used to depict a sequence of decisions in a horizontal tree-like structure. The branches of the tree represent the decision paths that a decision maker may choose to take in the sequence. Consider a survey/product introduction sequential decision problem to illustrate a sequential solution procedure using decision trees. Suppose a marketing manager must decide whether to introduce a new product. The decision tree mapping of the problem is presented in Figure G.1. Note at the top of Figure G.1 that there are two sequential decisions included in the problem. The first is deciding if a survey on customer demand is to be undertaken, and the second is deciding whether to introduce a product. In each decision, there are alternative actions that can be taken, events (some with probabilities), and payoffs.

Figure G.1 Decision tree of sequential survey/product introduction problem To solve this sequential decision problem, use the backward decision method. The backward decision method for sequential decision problems combines the economic criterion of the maximax strategy with the expected value of the probability decision criterion discussed previously in this appendix. The backward decision method begins, as its name implies, at the payoffs at the back of the second decision. Using the maximax strategy with an environment of certainty, select the maximum payoffs for all the branches in the second decision first. The payoffs are in millions of dollars. In Figure G.2, it is apparent one would choose either the action of Introduce Product with a payoff of $6 million or the action of No Introduction with a payoff of $–2 million. So the maximax strategy would be to choose to Introduce Product with payoffs of $6 million, and the No Introduction alternative would be discarded. The double bars, or ||, indicate that the branches are not chosen and discarded. Note the maximax payoffs are brought forward to the Second Decision box in Figure G.2. In effect, this problem is being worked backward from the second decision to the first.

Figure G.2 Step 1 of backward solution to sequential survey/product introduction problem We now can use the probability information on the events to calculate the expected value payoffs on which the first decision’s action can be based. The expected value calculations are presented in Figure G.3. The resulting expected value of Take Survey action is $2.16 million, and the certainty payoff of No Survey is $6 million. So the manager would choose not to take a survey and introduce the product with the expectation of a $6 million payoff.

Figure G.3 Step 2 of backward solution to sequential survey/product introduction problem This backward decision method assumes that the maximax strategy and the resulting expected values are, on average, reflective of the payoff values expected for the problem situation. The use of the solution method can be expanded to more complex problems involving three, four, or more sequential decisions. In such problems, there may be more than one set of probabilities for possible events in the sequential decision. Assume in these problems that the probabilities for each subsequent decision are statistically independent of the outcomes of each decision. If the probabilities are not independent, their conditional probability nature must be analytically considered in the sequential decision process. One way to calculate these conditional probabilities is through the use of Bayes’s theorem (explained in the next section).

G.10. The Value of Imperfect Information: Bayes’s Theorem Most additional information is imperfect in that it is usually obtained in a survey or research of a sample of information, rather than a population of all information. The value of EVPI is that it provides an upper boundary of possible investment for additional information in decision-making under risk. Any information, even imperfect information that improves the chances of making a correct decision and increases the expected payoffs, may be worth the additional cost of obtaining it. The procedure by which to determine the value of imperfect additional information involves the use of Bayes’s theorem. Bayes’s theorem can be used to revise prior or given probabilities by using conditional probability information (that is, the additional, imperfect information). Bayes’s theorem reverses the events in a conditional probability (P(A | B) to find P(B | A)). The formula based on Bayes’s theorem that reverses conditional probabilities follows:

where: P(B | A) = conditional probability of event, B, given event, A P(Bi) = probability of i = 1, 2, 3, . n mutually exclusive and collectively exhaustive events, B P(A | Bi) = conditional probability of event, A, given each event, B Bayes’s theorem is based on the rule of multiplication (see Appendix A, Section A.3.3) when events, A and B, are not independent. So the conditional probability of P(A and B) is found this way: P(A and B) = P(B) × P(B|A) This rule can be converted into Bayes’s formula by dividing both sides of the equality by P(B) or: In this revised expression, the P(B) denominator is called the marginal probability of all joint, P(A and B), probabilities. The marginal probability is the sum of the product of all P(Bi) × P(A | Bi). The term marginal probability comes from the fact that this probability is usually obtained from the margins (where summations of probabilities are usually located) of joint

probability tables. This summation is divided into a single P(A and B) value to obtain the desired reversed conditional probability of P(B | A). To see how Bayes’s theorem is applied in business analytics and decisionmaking, consider a modified version of the sequential decision-making situation in Section G.9. Suppose one is facing a business decision concerning the introduction of a new product similar to the decision tree presented in Figure G.4.

Figure G.4 Decision tree for modified survey/product introduction problem There are two action choices: Introduce Product and No Introduction. After reviewing the new product, make a subjective judgment on the new product’s sales potential. Such probabilities in Appendix A were called subjective probabilities because of their judgmental origin. These prior or given probabilities of the two events of High Sales and Low Sales are presented in Table G.13.

Table G.13 Payoff Table and Prior Probabilities for Survey/Product Introduction Problem Based on judgmental sales potential, the profit payoff values for each action is estimated and presented in Table G.13. Based on this prior information, we can determine the expected sales payoffs for each alternative action as: Expected Payoff for Action: Introduce Product = (0.2)($6.0) + (0.8)($–2.0) = $–.4 million No Introduction = (0.2)($0) + (0.8)($0) = $0 Based solely on these expected sales values, choose the action of No Introduction to minimize the loss. (Losing $0 is better than losing $.4 million in sales.) On the other hand, the EVPI in this decision situation follows: EVPI = (0.2)($6.0) + (0.8)($0) = $1.2 million The EVPI indicates that considerable expected sales are possible in this problem if one has perfect information. It can also be interpreted to justify pursuing additional imperfect information if the cost of that imperfect information is less than the expected contribution of $1.2 million that the sales will profit the organization. This problem now becomes a sequential decision-making situation, where the first decision is whether to obtain additional information (No Survey or Take Survey), and the second decision is to Introduce Product or No Introduction. In the upper branches of the sequential decision tree in Figure G.4, the original product introduction decision is presented. Suppose a survey is to be conducted to obtain the additional, imperfect information on which to base the product introduction decision. The purpose of the survey is to determine the successfulness of the new product. The survey will have two possible events: Survey Predicts Success or Survey Predicts Failure. The lower branches of the decision tree in Figure G.4 present this sequence of decisions. Given the problem presented in Figure G.4, one might consider using the backward solution method to determine the expected payoffs and arrive at a decision. Unfortunately, the decision cannot

be made this way until one determines the probabilities for the events of survey prediction for the first decision. To obtain the information on the events of survey prediction, one cannot use simple probabilities. The value of the probability of predicting any product success or failure may be meaningless. (It may have nothing to do with the introduction of this new product or resulting sales.) Instead, one must recognize the dependence of the event probabilities in the sequence of decisions. (Probabilities of survey results and actual sales can be related or dependent.) In this problem, decide first if additional information (via a survey) is to be obtained, and second, if the product will be introduced. Because it is necessary to start backward in the problem with the payoffs for the second decision, one must determine the probability of the second decision’s events occurring, given that the first decision and its events have occurred. So, the probabilities must be determined to reflect this sequence of decision-making. Specifically, determine the conditional probabilities of the actual sales given survey results. To do this, use Bayes’s theorem and some additional objective probabilistic information. The procedure for revising prior probabilities using Bayes’s theorem consists of the following steps: 1. Obtain the prior and conditional probabilities for the events in the decision-making situation—In the survey/product introduction problem, the prior probabilities (P(H) = 0.2 and P(L) = 0.8) are given. The conditional probabilities are the additional information that is being brought into this problem. Note based only on the prior probabilities that one would not introduce the product. The conditional probabilities can be obtained from objective sources (past history of surveys on similar products or similar surveys), or they can come from subjective sources (additional experts with experiential judgment information). In the case of the survey/product introduction problem, use the conditional probabilities presented in Table G.14. Note in that table the probability of survey results that predict a successful product, given that High Sales of 0.6 is experienced.

Table G.14 Conditional Probabilities of Survey Results, Given Actual Sales for Survey/Product Introduction Problem: P(Survey Results | Actual Results) 2. Convert the prior and conditional probabilities into joint and marginal probabilities—The formulas for this conversion were presented in Appendix A and repeated for this problem in Table G.15 (A). The computations for the joint probabilities are presented in Table G.15 (B). As can be seen in Table G.15 (B), the joint probability of having High Sales and a survey result of a successful product are found by the following equation: P(H and S) = (Prior probability of H) × (Conditional probability of S given H) = P(H) × P(S | H) = (0.2)(0.6) = 0.12 The marginal probabilities of the survey results are found by adding the joint probabilities for all sales events. So the marginal probability of a survey predicting a product will be successful follows: P(S) = P(H and S) + P(L and S) = 0.12 + 0 .24 = 0.36

Table G.15 Joint Probability Table Computations for Survey/Product Introduction Problem: P(Actual Sales and Survey Results) 3. Compute the revised or posterior probabilities using Bayes’s theorem—The term posterior probabilities indicates that the prior probabilities have been revised to include additional probabilistic information (the conditional probabilities of survey results). Hence, the posterior probabilities are after, or posterior to, the prior probabilities. The computation of the posterior probabilities is accomplished using Bayes’s theorem. The posterior probabilities for each of the end branches or payoff branches in Figure G.4 must be computed to reflect the addition of survey result information in the decision process. Using Bayes’s theorem, the computation of the posterior probability of having High Sales given a survey result of the product being successful follows: Note that the resulting posterior probability of 0.333 is greater than the prior probability of 0.2, indicating a revision based on the additional information. The other three posterior probabilities can be similarly computed as follows:

This three-step procedure can be used on any size problem where the necessary prior and conditional probabilities are available. We can now use the marginal and posterior probabilities in combination with the backward solution method to resolve the survey/product introduction problem using sequential decision-making. In Figure G.5, the marginal and posterior probabilities are incorporated into the decision tree. As can be seen, the posterior probabilities are positioned in the end of the branches for the Take Survey decision. The expected values for each of the branches can then be computed in the same way as in the decision tree problem presented earlier. The expected values for all the No Introduction choices are $0, but now, because of the revision of the probabilities, there is a positive sales payoff of $.664 million in sales in one of the branches. Using the maximax decision criterion, choose the best payoffs from the second decision, and bring them forward to be considered in the first decision. Because the events of survey results have a probable occurrence (measured by marginal probabilities), compute a second expected value of $.239 million, representing the expected payoff for the choice of Take Survey. The decision then comes down to a choice between No Survey, which results in a $0 payoff, or Take Survey, which results in a $.239 million payoff. The best choice using the expected value criterion would be to Take Survey. If it predicts Success, one would choose to Introduce Product, and if that happens, a $.239 million payoff could be expected. If the survey predicts Failure, one would choose No Introduction, and the resulting payoff would be $0.

Figure G.5 Sequential decision-making solution for survey/product introduction problem In this problem, a situation, based solely on prior probabilities, is faced in which the expected sales turned out to be $0. By adding conditional probabilities of survey results, it was determined that expected sales would increase to $.239 million. The difference occurs because of the addition of information. The information that is added is imperfect but results in an expected sales increase (over the original decision) of $.239 million. Users should be aware of several factors when using this procedure. First, the expected value of imperfect information is an expected value and, as such, one is not assured of receiving it. Second, the additional imperfect information is yet to be obtained. Therefore, its true value is not really determinable. One can only determine if additional information is of value if it improves the ability to correctly make decisions, and if it increases the expected payoffs one expects to receive once it is used in the decision process. Third, the dependency of the event probabilities may or may not be

reliable. The use of probabilities for one event to revise another event may not be valid unless the relationship can be proven. Finally, the cost of obtaining the additional information has not been discussed. If the cost to take the survey is greater than the profit in sales from the additional $.239 million, one would not take the survey.

G.11. Decision Theory Practice Problems Below are several practice decision theory problems. Each problem is followed by the answer. Use these problems to practice the methodologies and concepts presented in this appendix. 1. A company would like to invest in one of the three types of resources: new personnel, new technology, or new processes. The firm works in an environment that permits no assurance of what the economic environment will be, nor has it information on what the environment will most likely become. The projected profit for an investment in the personnel resource can be either $2.3 million per year if a prosperous market exists or only $1.1 million if a depressed market exists. The projected profit for an investment in the technology resource can be either $2.8 million per year if a prosperous market exists or only $1.3 million if a depressed market exists. The projected profit for an investment in the process resource can be either $0.7 million per year if a prosperous market exists or $4.2 million if a depressed market exists. a. What is the formulation of this DT model?

b. Which DT environment does this problem fall into: certainty or uncertainty? Answer: Because the states of nature are uncertain, this is an uncertainty problem. c. Using the maximax criteria, what is the best choice? Answer: Processes with a payoff of $4.2 million. d. Using the maximin criteria, what is the best choice? Answer: Technology at $1.3 million.

2. Due to a favorable stock market outcome, a firm has an opportunity to invest in new IT technologies to support new ecommerce markets they are developing. The two new markets they plan to develop sell their services to include business-to-business (B2B) and business-toconsumer (B2C). To support these markets, they can invest in one of the following three technologies: A, B, or C. The estimated yearly profit that can be provided by adding technology A in a B2B market is $3.5 million and in a B2C market is $4.0 million. The estimated yearly profit that can be provided by adding the use of technology B in a B2B market is $2.2 million and in a B2C market is $4.9 million. The estimated yearly profit that can be provided by adding the use of technology C in a B2B market is $5.0 million and in a B2C market is $2.0 million. a. What is the formulation of this DT model?

b. Is this problem a certainty problem or an uncertainty problem? Answer: Because the states of nature are uncertain, this is an uncertainty problem. c. If one uses a Laplace criterion, what is the best choice? Answer: Technology A with a payoff of $3.75 million. d. If one uses the minimax criteria, what is the best choice? Answer: Technology A with a max regret at $1.5 million. The data that follows is used in Problems 3 and 4. Suppose one has the following cost payoff table:

3. Using maximax, maximin, or a Laplace criterion, what is the best choice? Answer: maximax − Action A (best payoff 560); maximin − Action A (worst payoff 120); Laplace − Action B (average payoff 292.5). 4. The probabilities for the four events are 0.40, 0.35, 0.15, and 0.10, respectively. What is the expected value of each of the three alternatives? Which alternative action is the best choice based on expected value analysis? Answer: A = 216, B = 305; C = 297.5; Best choice = B. 5. Recalculate the expected value of additional imperfect information for the survey/product introduction problem in Table G.13 using the revised payoff table that follows:

Answer: EVPI = 0.2(24) + 0.8(0) = 4.8; EV(Introduce Product) = 0.2(24) + 0.8(0) = 4.8; EV(No Introduction) = 0.2(0) + 0.8(0) = 0. 6. What is the best decision in the decision trees in Figure G.A & G.B if one wants to maximize the expected payoffs? Answer: B is the best choice, with an EV of 427, as opposed to 173.9 for A.

Figure G.A Decision tree

Figure G.B Decision tree 7. Figure G.7 shows a profit maximizing decision tree. Using the backward solution method, determine the best decision. What is the

EMV of the “best” decision? Answer: No is the best choice, with an EV of 154.3.

Figure G.7 Profit-maximizing decision tree

Index A addition, rules of, 173-174 additive time series model, 274 additivity in LP (Linear Programming) models, 232 administrators, 31 aligning business analytics, 45-46 management issues, 54 change management, 58-59 ensuring data quality, 55-57 establishing information policy, 54 measuring business analytics contribution, 58 outsourcing business analytics, 55 organization structures, 46-50 centralized BA organization structure, 49-50 functional organization structure, 48 hierarchical relationships, 46 matrix organization structure, 48 project structure, 47-48 reasons for BA initiative and organization failure, 51-50 teams, 50-53 collaboration, 50-53 participant roles, 52 reasons for team failures, 53 alternatives (DT), 304 Analysis ToolPak, 39 analytics. See also DT (decision theory) alignment. See business analytics alignment analytic purposes and tools, 5 business analytics personnel, 30-33 administrators, 31 BAP (Business Analytics Professional) exam, 30-31 designers, 31 developers, 31 skills and competency requirements, 32-33 solution experts, 31

technical specialists, 31 business analytics process data measurement scales, 8 explained, 7-10 relationship with organization decision-making process (ODMP), 1012 characteristics of, 6 decision analysis. See DT (decision theory) definition of, 3-4 descriptive analytics analytic purposes and tools, 5 confidence intervals, 76-77 definition of, 4 descriptive statistics, 67-72 marketing/planning case study example, 80-90 overview, 63-64 probability distributions, 78-80 sampling estimation, 76-77 sampling methods, 73-75 statistical charts, 64-67 supply chain shipping problem case study, 141-145 predictive analytics analytic purposes and tools, 5 data mining, 97-102 data-driven models, 96-97 definition of, 4 logic-driven models, 94-96 marketing/planning case study example, 102-114 methodologies, 119-120 overview, 93-94 prescriptive modeling, 120-122 supply chain shipping problem case study, 147-157 prescriptive analytics analytic purposes and tools, 5 definition of, 4 integer programming. See IP (integer programming) regression analysis, 97

Durbin-Watson Autocorrelation Test, 284 multiple regression models, 281-284 simple regression model, 276-281 sensitivity analysis economic value of resources, determining, 258-259 overview, 242-243 primal maximization problems, 243-251 primal minimization problems, 251-258 analytics analysts, 51 analytics modelers, 51 analytics process designers, 51 ANOVA testing, 9, 195 applications of business analytics to enhance decision-making, 23-24 applied LP (Linear Programming) model, 202 area charts, 65 artificial variables, 219 assessing probability Frequency Theory, 171-172 Principle of Insufficient Reason, 172 rules of addition, 173-174 rules of multiplication, 174-177 associations, 39, 99 averages, smoothing, 286-288

B BA team heads, 51 backward decision method, 317-320 BAP (Business Analytics Professional) exam, 30-31 bar charts, 65 Bayes’s theorem, 321-328 belief of physical proximity, 51 BI (business intelligence), 5-6 billing and reminder systems, 34 binomial probability distribution, 179-181 binomial tests, 199 blending formulations, 230 branch-and-bound method, 264-267

business analytics alignment, 45-46 management issues, 54 change management, 58-59 ensuring data quality, 55-57 establishing information policy, 54 measuring business analytics contribution, 58 outsourcing business analytics, 55 organization structures, 46-50 centralized BA organization structure, 49-50 functional organization structure, 48 hierarchical relationships, 46 matrix organization structure, 48 project structure, 47-48 reasons for BA initiative and organization failure, 51-50 teams, 50-53 collaboration, 50-53 participant roles, 52 reasons for team failures, 53 business analytics personnel, 30-33 administrators, 31 BAP (Business Analytics Professional) exam, 30-31 designers, 31 developers, 31 skills and competency requirements, 32-33 solution experts, 31 technical specialists, 31 business analytics process data measurement scales, 8 explained, 7-10 relationship with organization decision-making process (ODMP), 10-12 Business Analytics Professional (BAP) exam, 30-31 business domain experts, 52 business intelligence (BI), 5-6 business performance tracking, 24 butcher problem example (LP), 208-210

C CAP (Certified Analytic Professional), 30

case studies explained, 121 marketing/planning case study example. See marketing/planning case study example supply chain shipping problem case study descriptive analytics analysis, 141-145 predictive analytics analysis, 147-157 prescriptive analysis, 158-163 problem background and data, 139-140 categorical data, 8 categorizing data, 33-35 cause-and-effect diagrams, 95 centralized BA organization structure, 49-50 certainty decision-making under certainty, 306 maximax criterion, 306 maximin criterion, 307 explained, 304 in LP (Linear Programming) models, 232 Certified Analytic Professional (CAP), 30 championing change, 59 change management, 58-59 best practices, 59-60 targets, 59 charts marketing/planning case study example case study background, 81 descriptive analytics analysis, 82-90 statistical charts, 65-67 CHISQ.TEST, 199 Chi-Square tests, 199 Claritas, 35 Clarke Special Parts problem example, 214-215 classification, 39, 99 clearly stated goals, 59 cluster random sampling, 73

clustering data mining, 39, 99 hierarchical clustering, 100 K-mean clustering, 100-102 coding, checking for, 57 coefficient of kurtosis, 68 coefficient of skewedness, 68 Cognizure BAP (Business Analytics Professional) exam, 30-31 collaboration lack of, 50 in teams, 50-53 column charts, 65 combinations, 169 communication good communication, 59 lack of, 53 competency requirements for business analytics personnel, 32-33 competition data sources, 35 competitive advantage achieving with business analytics, 20-21 innovation, 21 operations efficiency, 21 price leadership, 21 product differentiation, 21 service effectiveness, 21 sustainability, 21 completeness, checking for, 57 computer simulation methods, 301 conditional probabilities, 176 confidence coefficient, 79 confidence intervals, 76-77 constrained optimization models, 128-129 constraints formulating, 130-131 LP (Linear Programming), 204-206 continuous probability distributions, 185-192

exponential probability distribution, 190-192 normal probability distribution, 186-189 correlation analysis, 97 counting, 167 combinations, 169 permutations, 167-168 repetitions, 170 credit union example of business analysis, 19 CRM (customer relationship management) systems, 34 culture as target of change management, 59 current data, checking for, 57 Curve Estimation (SPSS), 288-289 curve fitting explained, 123-129 SPSS Curve Estimation, 288-289 supply chain shipping problem case study, 147-154 customer demographics, 35 customer internal data, 34 customer profitability, increasing, 23 customer relationship management (CRM) systems, 34 customer satisfaction, 35 customer service problem example (LP), 213-214 cyclical variation, 275

D data inspection items, 57 data management technology, 37 data managers, 52 data marts, 38 data measurement scales, 8 data mining, 38-40, 97-98 methodologies, 99-102 discriminant analysis, 100 hierarchical clustering, 100 K-mean clustering, 100-102 logistic regression, 100 neural networks, 100

types of information, 99 simple illustration of, 98-99 data privacy, 36 data quality ensuring, 55-57 overview, 35-36 data sets, 3 data sources categorizing data, 33-35 data privacy, 35-36 data quality, 35-36 external sources, 34-35 internal sources, 34 new sources of data, applying business analytics to, 23-25 data visualization marketing/planning case study example case study background, 81 descriptive analytics analysis, 82-90 statistical charts, 64-67 data warehouses, 38 database management systems (DBMS), 37-36 databases, 3 database encyclopedia content, 36 DBMS (database management systems), 37-36 data-driven models, 96-97 DBMS (database management systems), 37-36 decision environments. See also DT (decision theory) certainty decision-making under certainty, 306-307 explained, 304 risk decision-making under risk, 307-311 explained, 304 uncertainty decision-making under uncertainty, 311-315 explained, 305

decision theory. See DT (decision theory) decision trees, 317-320 decision variables, defining, 130 delegation of responsibility, 51 descriptive analytics analytic purposes and tools, 5 confidence intervals, 76-77 definition of, 4 descriptive statistics, 67-72 marketing/planning case study example, 80 case study background, 81 descriptive analytics analysis, 82-90 overview, 63-64 probability distributions, 78-80 sampling estimation, 76-77 sampling methods, 73-75 statistical charts, 65-67 supply chain shipping problem case study, 141-145 actual monthly customer demand in motors, 143 Chicago customer demand (graph), 143 estimated shipping costs per motor, 141 Excel summary statistics of actual monthly customer demand in motors, 144 Houston customer demand (graph), 143 Kansas City customer demand (graph), 145 Little Rock customer demand (graph), 145 Oklahoma City customer demand (graph), 145 Omaha customer demand (graph), 145 problem background and data, 140 SPSS summary statistics of actual monthly customer demand in motors, 144 designers, 31 deterministic simulation, 295-296 developers, 31 diagrams cause-and-effect diagrams, 95 influence diagrams, 95-96

diet problem example (LP), 210-212 differential calculus, 134 digital analytics, 23-25 discrete probability distributions, 178-184 binomial probability distribution, 179-181 geometric probability distribution, 184 hypergeometric probability distribution, 184 Poisson probability distribution, 182-184 discriminant analysis, 100 divisibility in LP (Linear Programming) models, 232 downloading LINGO, 220 DT (decision theory) Bayes’s theorem, 321-328 decision-making under certainty, 306 maximax criterion, 306 maximin criterion, 307 decision-making under risk, 307 EV (expected value) criterion, 308-309 expected opportunity loss criterion, 309-311 origin of probabilities, 308 decision-making under uncertainty, 311 Hurwicz criterion, 312-313 Laplace criterion, 311-312 maximax criterion, 312 maximin criterion, 312 minimax criterion, 313-315 enhancing decision-making with business analytics, 23-24 EVPI (expected value of perfect information), 315 model elements, 304 model formulation, 305-306 overview, 122, 303 practice problems, 328-333 sequential decisions and decision trees, 317-320 types of decision environments, 304-305 duality duality practice problems, 259-261 economic value of resources, determining, 258-259

informational value of, 242 overview, 241 primal maximization problems, 243-251 primal minimization problems, 251-258 Dun & Bradstreet, 35 duplication, checking for, 57 Durbin-Watson Autocorrelation Test, 284

E economic data sources, 35 economic value of resources, determining, 258-259 ensuring data quality, 55-57 enterprise resource planning (ERP) systems, 34 Equifax, 35 ERP (enterprise resource planning) systems, 34 errors confidence intervals, 76-77 error metrics, 291-292 establishing information policy, 54 estimating sampling, 76-77 EV (expected value) criterion, 308-309 EVPI (expected value of perfect information), 315 Excel computer-based solution with simplex method, 224-227 LP (Linear Programming) solutions infeasible solutions, 229 practice problems, 233-238 unbounded solutions, 227-228 marketing/planning case study example case study background, 81, 103 descriptive analytics analysis, 82-90 predictive analytics analysis, 104-114 solution for LP marketing/planning model, 132-133 primal maximization problems, 243-251 primal minimization problems, 251-258 simple regression model, 277-280 supply chain shipping problem case study, 144

t-test statistics, 197 ZOP (zero-one programming) problems/models, solving, 268-269 executive sponsorship, lack of, 51 expected opportunity loss criterion, 309-311 expected value (EV) criterion, 308-309 expected value of perfect information (EVPI), 315 experiments, 177 exponential probability distribution, 190-192 exponential smoothing example of, 285 simple model, 284-285 smoothing averages, 286-288 external data sources, 34-35

F factorials, 168 failures failure to deliver, 53 failure to provide value, 53 reasons for BA initiative and organization failure, 50-51 reasons for team failures, 53 farming problem example (LP), 212-213 Federal Division problem example (LP), 215-217 finiteness in LP (Linear Programming) models, 232 fitting models to data, 288-289 forecasting additive time series model, 274 data mining, 39, 99 exponential smoothing example of, 285 simple model, 284-285 fitting models to data, 288-289 forecasting accuracy statistics, 291-292 MAD (mean absolute deviation), 291-292 MAPE (mean absolute percentage error), 292 MSE (mean square error), 291-292 forecasting methods, 275-276

marketing/planning case study example, 112 multiple regression models, 281 application, 282-283 limitations in forecasting time series data, 283-284 multiplicative time series model, 274 overview, 97, 271 practice problems, 292-293 simple regression model computer-based solution, 277-280 model for trend, 276 statistical assumptions and rules, 280-281 smoothing averages, 286-288 supply chain shipping problem case study developing forecasting models, 147-154 resulting warehouse customer demand forecasts, 157 validating forecasting models, 155-157 time series data, variation in cyclical variation, 275 random variation, 275 seasonal variation, 274 trend variation, 274 variation in time series data, 272-274 formulating DT (decision theory) models, 305-306 F-ratio statistic, 110 Frequency Theory, 171-172 F-Test Two-Sample for Variances tool, 195 functional organization structure, 48 functions, objective, 203-204

G generalized LP (Linear Programming) model, 202 geometric probability distribution, 184 given requirements, stating, 131, 206 goals, 59 Google Insights for Search, 39 Google Trends, 39

hardware, 37 hierarchical clustering, 100 hierarchical relationships, 46 histograms, 66 human resources decisions, 23 human resources data, 34 lack of, 51 Hurwicz criterion, 312-313 hypergeometric probability distribution, 184 hypothesis testing, 193-199

I IBM’s SPSS software, 40 IMF (International Monetary Fund), 35 implementation specialists, 52 importance of business analytics applications to enhance decision-making, 23-24 new sources of data, 23-25 overview, 17-18 providing answers to questions, 18-20 strategy for competitive advantage, 20-21 inability to delegate responsibility, 51 inability to prove success, 53 inconsistent values, checking for, 57 increasing customer profitability, 24 infeasible solutions, 229 influence diagrams, 95-96 information policy, establishing, 54 information technology (IT) computer hardware, 36 computer software, 36 data management technology, 37 data marts, 38 data mining, 38-40 data warehouses, 38 database encyclopedia content, 36

DBMS (database management systems), 37-36 infrastructure, 37 networking and telecommunications technology, 37 INFORMS, 30 innovation, achieving with business analytics, 21 Insufficient Reason, Principle of, 172 integer programming. See IP (integer programming) integrated processes, lack of, 51 internal data sources, 34 International Monetary Fund (IMF), 35 interval data, 8 IP (integer programming), 121, 263 explained, 263-264 IP problems/models, solving, 264 maximization IP problem, 265-266 minimization IP problem, 266-267 practice problems, 270 ZOP (zero-one programming) explained, 264 problems/models, solving, 268-269 IT (information technology) computer hardware, 37 computer software, 37 data management technology, 37 data marts, 38 data mining, 38-40 data warehouses, 38 database encyclopedia content, 36 DBMS (database management systems), 37-36 infrastructure, 37 networking and telecommunications technology, 37

J-K judgment sampling, 74 justification, lack of, 53 K-mean clustering, 101-102 Kolmogorov-Smirnov (One-Way) tests, 199

L Laplace criterion, 311-312 leadership, lack of, 50 limited context perception, 50 Lindo Systems LINGO. See LINGO line charts explained, 66 marketing/planning case study example case study background, 81 descriptive analytics analysis, 82-90 Linear Programming. See LP (Linear Programming) linearity in LP (Linear Programming) models, 232 LINGO, 40 downloading, 220 IP problems/models, solving maximization IP problem, 265-266 minimization IP problem, 266-267 LP (Linear Programming) solutions computer-based solution with simplex method, 220-224 infeasible solutions, 229 marketing/planning case study example, 132-133 practice problems, 233-238 unbounded solutions, 227-228 overview, 40 primal maximization problems, 243-251 primal minimization problems, 251-258 supply chain shipping problem case study, 159-161 trial versions, 220 ZOP (zero-one programming) problems/models, solving, 268-269 little data, 3 logic-driven models, 94-96 logistic regression, 100 loss values, expected opportunity loss criterion, 309-311 LP (Linear Programming) applied LP model, 121, 202

blending formulations, 230 computer-based solutions with simplex method, 217-218 Excel solution, 224-227 LINGO solution, 220-224 simplex variables, 218-220 constraints, 204-206 duality duality practice problems, 259-261 economic value of resources, determining, 258-259 informational value of, 242 overview, 241 primal maximization problems, 243-251 primal minimization problems, 251-258 sensitivity analysis, 242-243 generalized LP model, 202 infeasible solutions, 229 maximization models, 201-202 minimization models, 201-202 multidimensional decision variable formulations, 231 necessary assumptions, 232 nonnegativity and given requirements, 206 objective function, 203-204 overview, 201-202 practice problems, 233-238 problem/model formulation butcher problem example, 208-210 Clarke Special Parts problem example, 214-215 customer service problem example, 213-214 diet problem example, 210-212 farming problem example, 212-213 Federal Division problem example, 215-217 stepwise procedure, 207-208 unbounded solutions, 227-228

M MAD (mean absolute deviation), 155-157, 162, 291-292 management issues, 54 change management, 58-59

best practices, 59 targets, 59 ensuring data quality, 55-57 establishing information policy, 54 measuring business analytics contribution, 58 outsourcing business analytics, 55 advantages of, 55 disadvantages of, 56 MAPE (mean absolute percentage error), 292 marginal probability, 321 marketing/planning case study example, 80-90 case study background, 81, 103, 129 descriptive analytics analysis, 82-90 predictive analytics analysis, 104-114 Excel best variable combination regression model and statistics, 113 Excel POS regression model, 108 Excel radio regression model, 109 Excel TV regression model, 109 forecasting model, 112 F-ratio statistic, 110 R-Square statistics, 110-111 SPSS best variable combination regression model and statistics, 106 SPSS Pearson correlation coefficients, 104 SPSS POS regression model, 106 SPSS radio regression model, 107 SPSS TV regression model, 108 prescriptive analysis, 102-103, 129-134 final comments, 133-134 formulation of LP marketing/planning model, 130-131 solution for LP marketing/planning model, 132-133 matrix organization structure, 48-49 maximax criterion, 306, 312 maximin criterion, 307, 312 maximization IP problem, solving, 265-266 maximization models LP (Linear Programming), 201-202 primal maximization problems, 243-251

maximum/minimum, 68 mean, 68 mean absolute deviation (MAD), 155-157, 162, 291-292 mean absolute percentage error (MAPE), 292 mean square error (MSE), 291-292 measured performance, 59 measuring business analytics contribution, 58 median, 68 merchandize strategy optimization, 23 methods, sampling, 73-75 Microsoft Excel, 39 minimax criterion, 313-315 minimization IP problem, solving, 266-267 minimization models LP (Linear Programming), 201-202 primal minimization problems, 251-258 minimum/maximum, 68 mobile analytics, 25 mode, 68 modeling constrained optimization models, 128-129 DT (decision theory) decision environments, 304-305 model elements, 304 model formulation, 305-306 overview, 303 exponential smoothing example of, 285 simple model, 284-285 forecasting models developing, 147-154 exponential smoothing, 284-285 fitting models to data, 288-289 forecasting accuracy statistics, 291-292 forecasting methods, 275-276 multiple regression models, 281-284

practice problems, 292-293 sample warehouse customer demand forecasts, 157 simple regression model, 276-281 smoothing averages, 286-288 statistical assumptions and rules, 280-281 validating, 155-157 LP (Linear Programming) applied LP model, 202 blending formulations, 230 computer-based solutions with simplex method, 217-227 constraints, 204-206 generalized LP model, 202 infeasible solutions, 229 maximization models, 201-202 minimization models, 201-202 multidimensional decision variable formulations, 231 necessary assumptions, 232 nonnegativity and given requirements, 206 objective function, 203-204 problem/model formulation, 207-217 unbounded solutions, 227-228 predictive modeling data-driven models, 96-97 logic-driven models, 94-96 prescriptive modeling, 120-122 case studies, 122 decision analysis, 122 integer programming. See integer programming linear programming. See LP (Linear Programming) nonlinear optimization, 121, 122-129 other methodologies, 122 simulation, 122, 295 deterministic simulation, 295-296 practice problems, 301 probabilistic simulation, 296-301 variation in time series data additive time series model, 274

cyclical variation, 275 multiplicative time series model, 274 random variation, 275 seasonal variation, 274 trend variation, 274 monitoring analysts, 52 Monte Carlo simulation method application, 298-301 procedure, 296-298 MSE (mean square error), 291-292 multidimensional decision variable formulations, 231 multiple regression models, 9, 281 application, 282-283 limitations in forecasting time series data, 283-284 multiplication, rules of, 174-177 multiplicative time series model, 274

N N function, 67 need for business analytics applications to enhance decision-making, 23-24 new sources of data, 23-25 overview, 17-18 providing answers to questions, 18-20 strategy for competitive advantage, 20-21 networking and telecommunications technology, 37 neural networks, 100 new sources of data, applying business analytics to, 23-25 Nielsen data, 35 nonlinear optimization, 121, 122-129 calculus methods, 129 curve fitting, 123-129, 288-289 quadratic programming, 128-129 nonnegativity, 131, 206 nonparametric hypothesis testing, 200-199 normal probability distribution, 186-189

objective function, 203-204 ODMP (organization decision-making process), 10-12 operations efficiency, achieving with business analytics, 21 optimization, nonlinear, 121, 122-129 calculus methods, 129 curve fitting, 123-129, 288-289 quadratic programming, 128-129 ordinal data, 8 organization decision-making process (ODMP), 10-12 organization structures, 45-50 centralized BA organization structure, 49-50 functional organization structure, 48 hierarchical relationships, 46 matrix organization structure, 48 project structure, 47-48 reasons for BA initiative and organization failure, 51-50 as target of change management, 59 organizational planning, 20 origin of probabilities, 308 outcomes, 177 outliers, checking for, 57 outsourcing business analytics, 55 advantages of, 55 disadvantages of, 55-56

P parametric hypothesis testing, 195-197 payoffs (DT), 304 period sampling, 74 permutations, 167-168 personnel, 30-33 administrators, 31 BAP (Business Analytics Professional) exam, 30-31 designers, 31 developers, 31 skills and competency requirements, 32-33 solution experts, 31

as target of change management, 59 technical specialists, 31 physical proximity, belief of, 50 pie charts, 66 planning, organizational, 20 Poisson probability distribution, 182-184 policy, information policy, 54 practice problems DT (decision theory), 328-333 forecasting, 292-293 IP (integer programming), 270 LP (Linear Programming), 233-238 simulation, 301 predictive analytics analytic purposes and tools, 5 data mining, 97-98 methodologies, 99-102 simple illustration of, 98-99 data-driven models, 96-97 definition of, 4 logic-driven models, 94-96 marketing/planning case study example, 102 case study background, 103 predictive analytics analysis, 104-114 overview, 93-94 supply chain shipping problem case study, 147-157 developing forecasting models, 147-154 problem background and data, 140 resulting warehouse customer demand forecasts, 157 validating forecasting models, 155-157 predictive modeling, logic-driven models, 94-96 prescriptive analytics analytic purposes and tools, 5 definition of, 4 marketing/planning case study example case study background, 129 prescriptive analysis, 129-134

methodologies, 119-120 prescriptive modeling, 120-122 case studies, 122 decision analysis, 122 integer programming. See integer programming linear programming. See LP (Linear Programming) nonlinear optimization, 121, 122-129 other methodologies, 122 simulation, 122 supply chain shipping problem case study, 158-163 demonstrating business performance improvement, 162-163 determining optimal shipping schedule, 159-161 problem background and data, 140 selecting and developing optimization shipping model, 158-159 summary of BA procedure for manufacturer, 161-162 prescriptive modeling, 120-122 case studies, 122 decision analysis, 122 integer programming, 122 IP (integer programming) explained, 263-264 IP problems/models, solving, 264-267 practice problems, 270 ZOP (zero-one programming) problems/models, solving, 264, 268-269 linear programming. See LP (Linear Programming) nonlinear optimization, 121, 122-129 calculus methods, 129 curve fitting, 123-129, 288-289 quadratic programming, 128-129 other methodologies, 122 simulation, 122 price leadership, achieving with business analytics, 21 primal maximization problems, 243-251 primal minimization problems, 251-258 Principle of Insufficient Reason, 172 privacy (data), 35-36 probabilistic simulation

Monte Carlo simulation method application, 298-301 procedure, 296-298 overview, 296 probability. See also DT (decision theory) Bayes’s theorem, 321-328 marginal probability, 321 Monte Carlo simulation method, application, 298-301 origin of probabilities, 308 probabilistic simulation, 296 Monte Carlo simulation method procedure, 296-298 overview, 296 probability concepts, 171 Frequency Theory, 171-172 Principle of Insufficient Reason, 172 rules of addition, 173-174 rules of multiplication, 174-177 probability distributions, 177-178 binomial probability distribution, 179-181 exponential probability distribution, 190-192 geometric probability distribution, 184 hypergeometric probability distribution, 184 normal probability distribution, 186-189 Poisson probability distribution, 182-184 random variables, 177 probability distributions, 78-80, 97, 177-178 continuous probability distributions, 185-192 exponential probability distribution, 190normal probability distribution, 186-189 discrete probability distributions, 178-184 binomial probability distribution, 179-181 geometric probability distribution, 184 hypergeometric probability distribution, 184 Poisson probability distribution, 182-184 random variables, 177 process of business analytics data measurement scales, 8

explained, 7-10 integrated processes, lack of, 51 relationship with organization decision-making process (ODMP), 10-12 product data, 34 product differentiation, achieving with business analytics, 21 production data, 34 profit, calculating, 96 project structure, 47-48 providing answers to questions, 18-20

Q quadratic programming, 127-129 quality of data ensuring, 56-57 overview, 35-36 Query Drilldown, 8 questionnaires, 34 questions business analytics seeks to answer, 18 quota sampling, 74

R random variables, 177 random variation, 275 range, 68 ratio data, 8 reducing risk, 24 regression analysis, 97 Durbin-Watson Autocorrelation Test, 284 multiple regression models, 281 application, 282-283 limitations in forecasting time series data, 283-284 simple regression model computer-based solution, 277-280 model for trend, 276-281 statistical assumptions and rules, 280-281 relevance, checking for, 57 repetitions, 170 responsibility, inability to delegate, 51

risk decision-making under risk, 307 EV (expected value) criterion, 308-309 expected opportunity loss criterion, 309-311 origin of probabilities, 308 explained, 304 risk reduction, 23 roles (team), 52 R-Square statistics, 110-111 rules of addition, 173-174 rules of multiplication, 174-177 run testing, 199

S sampling sample variance, 69 sampling estimation, 76-77, 97 sampling methods, 73-75 SAS Analytics Pro, 7, 40 scatter charts, 66 seasonal variation, 274 sensitivity analysis economic value of resources, determining, 258-259 overview, 242-243 primal maximization problems, 243-251 primal minimization problems, 251-258 sequences data mining, 39, 99 sequential decisions and decision trees, 317-320 sequential decisions, 317-320 senior management support, 59 service effectiveness, achieving with business analytics, 21 simple random sampling, 73 simple regression model computer-based solution, 277-280 model for trend, 276-281 statistical assumptions and rules, 280-281

simplex method, 217-218 Excel, 224-227 LINGO, 220-224 simplex variables, 218-220 artificial variables, 219 slack variables, 218-219 surplus variables, 219 simplex variables, 218-220 artificial variables, 219 slack variables, 218-219 surplus variables, 219 simulation, 97, 122, 295 computer simulation methods, 301 deterministic simulation, 295-296 practice problems, 301 probabilistic simulation, 296 Monte Carlo simulation method, 296-298 Monte Carlo simulation method application, 298-301 skewedness, 69 skill requirements for business analytics personnel, 32-33 slack variables, 218-219 smoothing averages, 286-288 social media analytics, 23-25 software, 37. See also specific software solution experts, 31 Solver, 39 SPSS software, 40 Curve Estimation, 288-289 Curve Fitting, 123-129, 148-153 K-Mean cluster software, 101-102 marketing/planning case study example case study background, 81, 103 descriptive analytics analysis, 82-90 predictive analytics analysis, 104-114 simple regression model, 277-280 supply chain shipping problem case study, 138 t-test statistics, 197

standard deviation, 68 standard error, 69 standard normal probability distribution, 78 states of nature (DT), 304 statistical charts, 65-67 statistical testing, 193-199 statistical tools, 167 counting, 167 combinations, 169 permutations, 167-168 repetitions, 170 descriptive statistics, 67-72 probability rules of addition, 173-174 rules of multiplication, 174-177 probability concepts, 171 conditional probabilities, 176 Frequency Theory, 171-172 Principle of Insufficient Reason, 172 probability distributions, 177-178 binomial probability distribution, 179-181 exponential probability distribution, 190-192 geometric probability distribution, 184 hypergeometric probability distribution, 184 normal probability distribution, 186-189 Poisson probability distribution, 182-184 random variables, 177 statistical charts, 64-67 statistical testing, 193-199 strategy for competitive advantage, 20-21 stratified random sampling, 73 structured data analytics, 25 success, proving, 53 sum, 67 supply chain shipping problem case study descriptive analytics analysis, 141-145 actual monthly customer demand in motors, 143

Chicago customer demand (graph), 143 estimated shipping costs per motor, 141 Excel summary statistics of actual monthly customer demand in motors, 144 Houston customer demand (graph), 143 Kansas City customer demand (graph), 145 Little Rock customer demand (graph), 145 Oklahoma City customer demand (graph), 145 Omaha customer demand (graph), 145 SPSS summary statistics of actual monthly customer demand in motors, 144 predictive analytics analysis, 147-157 developing forecasting models, 147-154 resulting warehouse customer demand forecasts, 157 validating forecasting models, 155-157 prescriptive analysis, 158-163 demonstrating business performance improvement, 162-163 determining optimal shipping schedule, 159-161 selecting and developing optimization shipping model, 158-159 summary of BA procedure for manufacturer, 161-162 problem background and data, 139-140 support, lack of, 50 surplus variables, 219 sustainability, achieving with business analytics, 21 systematic random sampling, 73

T targets of change management, 59 tasks as target of change management, 59 teams, 51-53 collaboration, 51-53 participant roles, 52 reasons for team failures, 53 technical specialists, 31 technology as target of change management, 59 testing Durbin-Watson Autocorrelation Test, 284

statistical testing, 193-199 text analytics, 23-25 time series data exponential smoothing example of, 285 simple model, 284-285 smoothing averages, 286-288 multiple regression models, 283-284 simple regression model additive model, 274 cyclical variation, 275 multiplicative model, 274 overview, 272-274 random variation, 275 seasonal variation, 274 trend variation, 274 trend simple regression model, 276-281 trend variation, 274 trials, 177 t-test: Paired Two Sample Means, 195

U unbounded solutions, 227-228 uncertainty decision-making under uncertainty, 311 Hurwicz criterion, 312-313 laplace criterion, 311-312 maximax criterion, 312 maximin criterion, 312 minimax criterion, 313-315 explained, 305 U.S. Census, 35

V validating forecasting models, 155-157 value EV (expected value) criterion, 308-309

EVPI (expected value of perfect information), 315 expected opportunity loss criterion, 309-311 failure to provide value, 53 inconsistent values, checking for, 57 variables slack variables, 218-219 surplus variables, 219 variance, 68, 219 variation in time series data, 272-274 visualizing data marketing/planning case study example case study background, 81 descriptive analytics analysis, 82-90 statistical charts, 65-67

W warehouses (data), 38 web logs, 34 web mining, 39 Wilcoxon Signed-Rank tests, 199

X-Y-Z Z values, 78-79 zero-one programming (ZOP) model explained, 264 problems/models, solving, 268-269