EM Algorithm concept
In a machine learning application, there will be two types of data present in the data set, observed and unobserved(latent variables/missing values that are not directly observable and are actually inferred from the values of the other observed variables).
EM algorithms are known for density estimation (maximum likelihood estimation) and EM is also famous for clustering algorithm.
The EM algorithm is an approach for performing maximum likelihood estimation in the presence of latent variables.
Probability Density estimation
Probability Density estimation is basically the construction of an estimate based on observed data. It involves selecting a probability distribution function and the parameters of that function that best explains the joint probability of the observed data.
Maximum likelihood estimation
To select the joint probability distribution, what we require is Density estimation.
Density estimation needs to find out a probability distribution function and the parameters of that distribution.
The most common technique to solve this problem is the Maximum Likelihood Estimation or simply “maximum likelihood”.
Maximum Likelihood Estimation
In statistics, maximum likelihood estimation is the method of estimating the parameters of a probability distribution by maximizing the likelihood function in order to make the observed data most probable for the statistical model.
But there lies a limitation with Maximum Likelihood, it assumes that the data is complete, fully observed, etc. It does not really mandate that the model will have access to all the data. Instead, it assumes that all the variables relevant to the model are already present. But in some cases, some relevant variables may remain hidden and cause inconsistencies.
So, basically to implement EM algorithm on problem of clustering we use Gaussian mixture model.
Steps involves in EM Algorithm
-
Consider a set of starting parameters in incomplete data (consider complete data with latent variables or missing values).
-
E-Step (Expectation step): In this step, what we do is that Basically the data in which the missing values and latent variables are present, we estimate them by observe data that we have. (Updating variables and data)
-
M-step (Maximization step): This step is basically used to complete the data we get from E-step. This step updates the hypothesis.
-
If the convergence is not matched then repeat step 2 and 3.
See the diagram for clarification

Use of GMM (Gaussian mixture model) in EM
- Gaussian Mixture Model is used the combination of probability distributions and the estimation of mean and standard deviation parameters.
- Gaussian mixture model has number of techniques to estimate data but common one is maximum likelihood.
GMM (Gaussian Mixture Model)
- Gaussian Mixture Model is used the combination of probability distributions and the estimation of mean and standard deviation parameters.
- Gaussian mixture model has number of techniques to estimate data but common one is maximum likelihood.
Let's see it by an example
- Let's take a case in which the data points we have, are generated from two different processes i. e. unknown.
- But we do not know which data point belongs to which class, and the data that we have also includes latent variables and missing values.
- Both processes have Gaussian probability distribution.
- We will be applying EM algorithm here with Gaussian mixture model to know which data points are from which process.
#Importing Important Libraries
from numpy import hstack
from numpy.random import normal
import matplotlib.pyplot as plt
#Defining samples from processes
sample1 = normal(loc=20, scale=5 , size=4000)
sample2 = normal(loc=40, scale=5 , size=8000)
sample = hstack((sample1,sample2))
plt.hist(sample, bins=50, density=True)
#Showing the data points
plt.show()
|
Output

Note:
- What we did here is we have taken one dimensional data with mean is 20 mfor process1 and 40 is for process2 and standard deviation is 5.
- we have taken 4000 point for process1 and 8000 points for process.
- And after that we plot a graph which shows data distribution.
- As you can see from the graph the data points between 20 and 40 are unclear as which distribution or process they belongs.
# Gaussian mixture model with expectation maximization
from numpy import hstack
from numpy.random import normal
#Importing Gaussian mixture model
from sklearn.mixture import GaussianMixture
# generate a sample
sample1 = normal(loc=20, scale=5, size=4000)
sample2 = normal(loc=40, scale=5, size=8000)
sample = hstack((sample1, sample2))
# reshape into a table with one column to fit the data
sample = sample.reshape((len(sample), 1))
# training the model
model = GaussianMixture(n_components=2, init_params='random')
model.fit(sample)
# predict latent values
yhat = model.predict(sample)
# check latent value for first few points
print(yhat[:80])
# check latent value for last few points
print(yhat[-80:])
|
Output

Note:In above code we used EM algorithm with Gaussian mixture model and from the output above we can see that our model predicted points in one class either 0 or 1 .