Preface

We are proud to present this intentionally short book on the essentials of using IBM SPSS Modeler. Data science and predictive analytics are hot topics right now, and this book might be perceived as being inspired by these new and exciting trends. While we certainly hope to attract a variety of readers, including young practitioners that are new to the field, in actuality the contents of this book have been shaped by a variety of forces that have been unfolding over a period of approximately 25 years.

In 1992, Colin Shearer and his colleagues, then at ISL, were finding, as Colin himself described it, that data mining projects involved a lot of hard work, and that most of that work was boring. Specifically, to get to the rewarding tasks of finding patterns using the modeling algorithms you had to do a lot of repetitive preparatory work. It was this observation—that virtually all data mining projects share some of the same routine operations—that gave birth to the idea of the first data mining workbench, Clementine (now called IBM SPSS Modeler). The software was designed to make the repetitive tasks as quick and easy as possible. It is that same observation that is at the heart of this book. We have carefully chosen those tasks that apply to nearly all Modeler projects. For that reason, this book is decidedly not encyclopedic, and we sincerely hope that you can outgrow this book in short order and can then move on to more advanced features of Modeler and explore its powerful collection of features.

Another inspiration for this book is the history of Clementine documentation and training from the early 1990s to the present. Given the motivation behind the software, early documentation often focused on short, simple examples that could be carefully followed and then imitated in real-world examples, even though the real-world applications were always much more complex. Some of the earliest examples of this were the original Clementine Application Templates (ISL CATs) from the 1990s, which have evolved so much as to be unrecognizable.

The two of us first encountered Modeler as members of the SPSS community in the period between SPSS's acquisition of ISL (1998) and IBM's acquisition of SPSS (2009). We were both extensively involved in Modeler training for SPSS. Jesus was the training curriculum lead for IBM SPSS at one point after the acquisition. It soon became clear that training in Modeler was going to evolve after the acquisition and more and more entities were going to be involved in training. Some years later, we found ourselves working together at an IBM partner and built a complete SPSS Statistics and SPSS Modeler curriculum for that company. We have spent hundreds of hours discussing Modeler training and thousands of hours conducting Modeler training. We are passionate about how to create the ideal experience for new users of Modeler. We anticipate that the readers of this book will be brand new users engaged in self-study, students in classes that use Modeler, or participants in short courses and seminars such as the ones that we have taught for years.

In 2010, also in response to the changing marketplace after the IBM acquisition, Tom Khabaza (data mining pioneer and one of the earliest members of the ISL/Clementine team) and Keith started a dialog about a possible rookie book about SPSS Modeler. We knew that Modeler might be reaching new audiences. We had spirited discussions and produced a detailed outline, but the project never quite got off the ground. In 2011, without any knowledge of our beginner's guide concept, Packt reached out to Keith and wanted him to recruit others to write a more advanced Modeler book in a cookbook format. At first, Tom and Keith resisted because we thought that a beginner's guide was badly needed and we had an existing plan. However, it all worked out in the end. We combined forces with almost a dozen Modeler experts, including Colin Shearer, who kindly wrote the foreword. Jesus and other experts we knew joined as either co-authors or technical reviewers. The success of the IBM SPSS Modeler Cookbook (2013) demonstrated that more advanced content was also needed.

This book would have been completely different if it had been written before the cookbook. Knowing that the cookbook exists has allowed us to stick to our goal of writing a quick and easy read with only the absolute essentials. It has been designed to dovetail nicely with the cookbook and serve as a kind of prequel. In designing this book, we were quite consciously aware that many people who read this book might use our IBM SPSS Modeler Essentials Packt video course as a companion. Since we tried to prioritize the absolute essentials in both, they necessarily cover similar ground. However, we chose different case study datasets for each, precisely to support the kind of learning that would come from working through both. We truly believe that they complement each other.

In that spirit, we have chosen a single case study to use throughout the book. It is just complex enough to suit our purposes, but clearly falls short of the complexity of a real-world example. This is a conscious decision. Work through this book. It is designed to be an experience, and not just a read, so follow it step by step from cover to cover. While we hope this book may also be useful to refer to later, we are trying to craft a positive (and easy) first-time experience with Modeler. Also, although we offer a sufficiently complex dataset to show the essentials, we do not attempt to fashion an elaborate scenario to place the dataset into a business context. This is also a conscious decision. We felt that a book on the essentials of Modeler should be a much more point and click book than a theory book. So if you want a book that emphasizes theory over practice, this may not be the best choice to begin your journey. We do rehearse the basic steps behind how modeling works in Modeler, but given the book's length, there is simply no room to discuss all the algorithms and the theory behind them in this book. We spend virtually all of the book pages on Data Understanding, Data Preparation, Modeling and Model Assessment, and spend virtually no pages on Business Understanding, Business Evaluation, and Deployment. Having said that, we care deeply about helping the reader understand why they are performing each step, and will always place the point and click steps in a proper context. That is why we are so carefully selective about how many steps, and which steps, we include in this short book.

IBM SPSS Modeler enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly, allowing your organization to base its decisions purely on the insights obtained from your data. It is our hope that you enjoy mining your data with Modeler and that this book serves as your guide to get you started on this journey. We sincerely hope that you enjoy learning from this book as much as we have enjoyed teaching its content.