Free Online Tutorials  Go To Your University  Placement Preparation  Online Live ClassesYoutube Live Link

SUMMER TRAINING AT GOEDUHUB TECHNOLOGIES, JAIPUR (Call:- 7976731765)Project Based Best Summer Training Courses in Jaipur

0 like 0 dislike
in Artificial Intelligence(AI) & Machine Learning by Goeduhub's Expert (3k points)
edited by
In this, Article we will discuss some statistics topics and concepts vital in data science. Statistics itself a large subject to study but as we know that machine learning and data science are the fields somehow depend on statistics. So, in this tutorial, we will take a look at some basic concepts which determine the scene behind machine learning and data science.

Centre Tendencies - Mean, Median, Mode .

Dispersion- Range , Interquartile Range (IQR) , Standard deviation , Variance.

Correlation , Frequencies , Proportion , Hypothesis and in inferences and it will be helpful if you basic knowledge of Probability and Algebra (vector)

1 Answer

0 like 0 dislike
by Goeduhub's Expert (3k points)
edited by
Best answer

For Basic Statistics click here (Part 1)


In simple terms  frequency  depicts or illustrate the occurrences of values in data (or in an experiment). To calculate frequency of a value (or event ) we use frequency tables (tabular version) and graphs (histogram).

*In statistics the frequency of an event is the number of times the observation occurred/recorded in an experiment or study. 

Relative frequency: A relative frequency can be calculated when we divide the frequency of particular value by the total number of data for each value. The sum of relative frequency table is generally one or close to one. 

cumulative frequency: To find the cumulative relative frequency, add all of the previous relative frequencies to the relative frequency for the current row.

Here is an example 

Histogram: A plot of the frequency table with the bins/ values on the x-axis and the count (or proportion) on the y-axis. 

 A density plot is a smoothed version of a histogram. 

densituy plot\

Typical Shapes of plots:

Shape is an important characteristic of smooth histogram. Depends on shapes we defined , what type of distribution it is. 

shapes of plot

Gaussian Distribution: 

Gaussian distribution (also known as normal distribution) is a bell-shaped curve, Another name previously used for the normal distribution was the error distribution. 

Error- In data science/ machine learning  error is a difference between predicted value and the data value. 

Variable:  A variable is a property or characteristic whose value changes. 

discrete variable - number of students present (count),   continuous variable- weight of students ( measure)

Random Variable: A random variable is a variable whose value is a numerical outcome of a random experiments.

In simple terms when we say random that means everyone in a sample has equal opportunities to get selected . (5 orange and 5 apple in bucket in random choice every apple and orange has 1/10 opportunities (probability) to get chosen)

For example: 

Let X represent the sum of two dice.

Then the probability distribution of X ;


Normal Distribution Curve:

gaussian pro

This is how normal distribution curve looks like. here mu (μ) is mean and sigma (σ) is standard deviation/variance.

The mean and standard deviation of a normal/gaussian distribution control how tall and wide it is.

For a random variable X gaussian distribution= X =  GD (μ, σ). The empirical formula also known as the three-sigma formula  or 68-95-99.7 formula. 

Which states that the data for random variable X in gaussian distribution fall according to three-sigma or empirical formula (percentage of values that lie within a band around the mean in a normal distribution).

That is 68% data in first deviation , 95% data in second deviation and 99.7 % data in third deviation.

Standard Normal Distribution:  The standard normal distribution (z distribution) is a normal distribution with a mean of 0 and a standard deviation of 1. 

Any point (x) from a normal distribution can be converted to the standard normal distribution (z) by subtracting the mean then divide by the standard deviation; this is also called normalization or standardization. And this is also called z-score and z distribution. 

z score

Where X is random variable,  μ (mean of X data)  σ (standard deviation) of X data. 

If mean> variable data (X) then z score is negative.

positive or negative sign indicates  whether it’s above or below the mean.

In simple terms the standard normal distribution is a scale to translate a normal distribution into numbers which may be used to learn more information about the  data than was originally known.

What is Outlier ?  Click Here to see answer 

Related questions


About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © Social::   |  |