What this book covers

Chapter 1, Introduction to Data Mining, introduces the notion of data mining and the CRISP-DM process model. You will learn what data mining is, why you would want to use it, and some of the types of questions you could answer with data mining.

Chapter 2, The Basics of Using IBM SPSS Modeler, introduces the Modeler graphic user interface. You will learn where different components of the program are located, how to work with nodes and create streams, and how to use various help options.

Chapter 3, Importing Data into Modeler, introduces the general data structure that is used in Modeler. You will learn how to read and display data, and you will be introduced to the concepts of measurement level and field roles.

Chapter 4, Data Quality and Exploration, focuses on the Data Understanding phase of data mining. We will spend some time exploring our data and assessing its quality. This chapter introduces the Data Audit node, which is used to explore and assess data. You will see this node's options and learn how to look over its results. You will also be introduced to the concept of missing data and will be shown ways to address it.

Chapter 5, Cleaning and Selecting Data, introduces the Data Preparation phase, so we can fix some of the problems that were previously identified during the Data Understanding phase. You will be shown how to select the appropriate cases for analysis, how to sort cases to get a better feel for the data, how to identify and remove duplicate cases, and how to reclassify categorical values to address various types of issues.

Chapter 6, Combining Data Files, continues with the Data Preparation phase of data mining by filtering fields and combining different types of data files.

Chapter 7, Deriving New Fields, introduces the Derive node. The Derive node can perform different types of calculations so that users can extract more information from the data. These additional fields can then provide insights that may not have been apparent. In this chapter, you will learn that the Derive node can create fields as formulas, flags, nominals, or conditionals.

Chapter 8, Looking for Relationships between Fields, focuses on discovering simple relationships between an outcome variable and a predictor variable. You will learn how to use several statistical and graphing nodes to determine which fields are related to each other. Specifically, you will learn to use the Distribution and Matrix nodes to assess the relationship between two categorical variables. You will also learn how to use the Histogram and Means nodes to identify the relationship between categorical and continuous fields. Finally, you will be introduced to the Plot and Statistics nodes to investigate relationships between continuous fields.

Chapter 9, Introduction to Modeling Options in IBM SPSS Modeler, introduces the different types of models available in Modeler and then provides an overview of the predictive models. Readers will also be introduced to the Partition node so that they can create Training and Testing datasets.

Chapter 10, Decision Tree Models, introduces readers to the decision tree theory. It then provides an overview of the CHAID model so that readers become familiar with the theory, dialogs, and results of this model.

Chapter 11, Model Assessment and Scoring, speaks about assessing the results once a model has been built. This chapter discusses different ways of assessing the results of a model. Readers will also learn how to score new data and how to export these predictions.