Poisson (counts)

A Poisson model is used to model counts of things. That could be the number of insurance claims filed in a given month, the number of calls which are received in a call center in a given minute, or the number of orders which are sold for a particular item. The Poisson distribution is the appropriate way for modeling count data since all data is positive and the range of the distribution is bound by 0 and infinity. The classic way of modeling a Poisson model is through the R glm() function using a poisson link function:

model.poisson <- glm(count ~ v1+v2+v3, data=inputdata, family=poisson()) 

Note that the preceding model specified merely shows the model in a generalized form. Do not try to run it since there are no variables other than the existing ones v1, v2, v3 or count. However, what the model specification says is that you will run a Poisson model via the following general steps:

  • The model will be run via the glm() function with some dependent variable to the left of the ~
  • The independent variable will be supplied to the right of the ~
  • The Data= parameter will supply an input dataset
  • The family= parameter will specify the type of general linear model that you will be running, and, in this case, it will be a Poisson model

To try a Poisson model on real data, we can use the warpsbreaks data, which is included with R.

  • First at the console, enter help (warpbreaks) to get a description of the dataset:

Then, set up and execute the model using the glm() function. We are predicting the number of breaks, using the type of wool and level of tension. Note that we need to add a summary() function after the glm() function in order to see the output, since running the glm() function just assigns the output to an object named the model.poisson model.

model.poisson <-glm(breaks~wool+tension, data=warpbreaks, family=poisson) 
summary(model.poisson)