An example of parameter estimation
In this example, we see how it's possible to apply the EM algorithm for the estimation of unknown parameters (inspired by an example discussed in the original paper Maximum likelihood from incomplete data via the em algorithm, Dempster A. P., Laird N. M., Rubin D. B., Journal of the Royal Statistical Society, B, 39(1):1–38, 11/1977).
Let's consider a sequence of n independent experiments modeled with a multinomial distribution with three possible outcomes x1, x2, x3 and corresponding probabilities p1, p2 and p3. The probability mass function is as follows:

Let's suppose that we can observe z1 = x1 + x2 and x3, but we don't have any direct access to the single values x1 and x2. Therefore, x1 and x2 are latent variables, while z1 and x3 are observed ones. The probability vector p is parameterized in the following way:

Our goal is to find the MLE for θ given n, z1, and x3. Let's start computing the log-likelihood:

We can derive the expression for the corresponding Q function, exploiting the linearity of the expected value operator E[•]:

The variables x1 and x2, given z1, are binomially distributed and can be expressed as a function of θt (we need to recompute them at each iteration). Hence, the expected value of x1(t+1) becomes as follows:

While the expected value of x2(t+1) is as follows:

If we apply these expressions in and compute the derivative with respect to θ, we get the following:

Therefore, solving for θ, we get the following:

At this point, we can derive the iterative expression for θ:

Let's compute the value of θ for z1 = 50 and x3 = 10:
def theta(theta_prev, z1=50.0, x3=10.0):
num = (8.0 * z1 * theta_prev) + (4.0 * x3 * (12.0 - theta_prev))
den = (z1 + x3) * (12.0 - theta_prev)
return num / den
theta_v = 0.01
for i in range(1000):
theta_v = theta(theta_v)
print(theta_v)
1.999999999999999
p = [theta_v/6.0, (1-(theta_v/4.0)), theta_v/12.0]
print(p)
[0.33333333333333315, 0.5000000000000002, 0.16666666666666657]
In this example, we have parameterized all probabilities and, considering that z1 = x1 + x2, we have one degree of freedom for the choice of θ. The reader can repeat the example by setting the value of one of p1 or p2 and leaving the other probabilities as functions of θ. The computation is almost identical but in this case, there are no degrees of freedom.