书名：Python：Advanced Guide to Artificial Intelligence
作者名：Giuseppe Bonaccorso Armando Fandango Rajalingappaa Shanmugamani
本章字数：428字
更新时间：2025-04-04 15:23:50

An example of parameter estimation

In this example, we see how it's possible to apply the EM algorithm for the estimation of unknown parameters (inspired by an example discussed in the original paper Maximum likelihood from incomplete data via the em algorithm, Dempster A. P., Laird N. M., Rubin D. B., Journal of the Royal Statistical Society, B, 39(1):1–38, 11/1977).

Let's consider a sequence of n independent experiments modeled with a multinomial distribution with three possible outcomes x₁, x₂, x₃ and corresponding probabilities p₁, p₂ and p₃. The probability mass function is as follows:

Let's suppose that we can observe z₁ = x₁+ x₂ and x₃, but we don't have any direct access to the single values x₁ and x₂. Therefore, x₁ and x₂ are latent variables, while z₁ and x₃ are observed ones. The probability vector p is parameterized in the following way:

Our goal is to find the MLE for θ given n, z₁, and x₃. Let's start computing the log-likelihood:

We can derive the expression for the corresponding Q function, exploiting the linearity of the expected value operator E[•]:

The variables x₁ and x_2, given z_1, are binomially distributed and can be expressed as a function of θ_t (we need to recompute them at each iteration). Hence, the expected value of x₁(t+1) becomes as follows:

While the expected value of x₂(t+1) is as follows:

If we apply these expressions in and compute the derivative with respect to θ, we get the following:

Therefore, solving for θ, we get the following:

At this point, we can derive the iterative expression for θ:

Let's compute the value of θ for z₁ = 50 and x₃ = 10:

def theta(theta_prev, z1=50.0, x3=10.0):
    num = (8.0 * z1 * theta_prev) + (4.0 * x3 * (12.0 - theta_prev))
    den = (z1 + x3) * (12.0 - theta_prev)
    return num / den

theta_v = 0.01

for i in range(1000):
    theta_v = theta(theta_v)

print(theta_v)
1.999999999999999

p = [theta_v/6.0, (1-(theta_v/4.0)), theta_v/12.0]

print(p)
[0.33333333333333315, 0.5000000000000002, 0.16666666666666657]

In this example, we have parameterized all probabilities and, considering that z₁ = x₁+ x₂, we have one degree of freedom for the choice of θ. The reader can repeat the example by setting the value of one of p₁ or p₂ and leaving the other probabilities as functions of θ. The computation is almost identical but in this case, there are no degrees of freedom.