Machine Learning Interview Questions Set 1
Q.1. What is machine learning?
Answer:- Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.
Q.2. What Are the Different Types of Machine Learning?
Answer:-
-
Supervised Learning:- In supervised machine learning, a model makes predictions or decisions based on past or labeled data. Labeled data refers to sets of data that are given tags or labels, and thus made more meaningful.
-
Unsupervised Learning:- In unsupervised learning, we don't have labeled data. A model can identify patterns, anomalies, and relationships in the input data.
-
Reinforcement Learning:- Using reinforcement learning, the model can learn based on the rewards it received for its previous action.
Q.3. What Are the Differences Between Machine Learning and Deep Learning?
Answer:-
Machine Learning |
Deep Learning |
- Enables machines to take decisions on their own, based on past data
- It needs only a small amount of data for training
- Works well on the low-end system, so you don't need large machines
- Most features need to be identified in advance and manually coded
|
- Enables machines to take decisions with the help of artificial neural networks
- It needs a large amount of training data
- Needs high-end machines because it requires a lot of computing power
- The machine learns the features from the data it is provided.
|
Q.4. What Is the Difference Between Inductive Machine Learning and Deductive Machine Learning?
Answer:-
Inductive Learning |
Deductive Learning |
- It observes instances based on defined principles to draw a conclusion
|
|
- Example: Explaining to a child to keep away from the fire by showing a video where fire causes damage
|
- Example: Allow the child to play with fire. If he or she gets burned, they will learn that it is dangerous and will refrain from making the same mistake again
|
Q.5. What Is a Random Forest?
Answer:- A ‘random forest’ is a supervised machine learning algorithm that is generally used for classification problems. It operates by constructing multiple decision trees during the training phase. The random forest chooses the decision of the majority of the trees as the final decision.
Q.6. What Is Bias and Variance in a Machine Learning Model?
Answer:- Bias:- Bias in a machine learning model occurs when the predicted values are further from the actual values. Low bias indicates a model where the prediction values are very close to the actual ones.
Underfitting: High bias can cause an algorithm to miss the relevant relations between features and target outputs.
Variance:- Variance refers to the amount the target model will change when trained with different training data. For a good model, the variance should be minimized.
Overfitting: High variance can cause an algorithm to model the random noise in the training data rather than the intended outputs.
Q.7. Explain Classification and Regression.
Answer:- Classification:- It is task of predicting a discrete class label. Data is labelled into one of two or more classes. A classification problem with two classes is called binary, more than two classes is called a multi class classification. Classifying an email as spam or non-spam is classification problem.
Regression:- It is task of predicting a continuous quantity. Problem requires the prediction of a quantity. A regression problem with multiple input variables is called a multivariate regression problem.Predicting the price of a stock over a period of time is regression problem.
Q.8. Compare K-means and KNN Algorithms.
K-means
|
KNN
|
K-Means is unsupervised
|
KNN is supervised in nature
|
K-Means is a clustering algorithm
|
KNN is a classification algorithm
|
The points in each cluster are similar to each other, and each cluster is different from its neighboring clusters
|
It classifies an unlabeled observation based on its K (can be any number) surrounding neighbors
|
Read in detail- K-means and KNN
Q.9. What is the difference between Entropy and Information Gain?
Answer:-
- Entropy is an indicator of how messy your data is. It decreases as you reach closer to the leaf node.
- The Information Gain is based on the decrease in entropy after a dataset is split on an attribute. It keeps on increasing as you reach closer to the leaf node
Q.10. What do you understand by Eigenvectors and Eigenvalues?
Answer:-
- Eigen vectors: Eigen vectors are those vectors whose direction remains unchanged even when a linear transformation is performed on them.
- Eigen values: Eigen value is the scalar that is used for the transformation of an Eigen vector.
Q.11. Explain Ensemble learning technique in Machine Learning.
Answer:- Ensemble learning is a technique that is used to create multiple Machine Learning models, which are then combined to produce more accurate results. A general Machine Learning model is built by using the entire training data set. However, in Ensemble Learning the training data set is split into multiple subsets, wherein each subset is used to build a separate model. After the models are trained, they are then combined to predict an outcome in such a way that the variance in the output is reduced.
Q.12. What Are the Applications of Supervised Machine Learning in Modern Businesses?
Answer:- Applications of supervised machine learning include:
-
Email Spam Detection- Here we train the model using historical data that consists of emails categorized as spam or not spam. This labeled information is fed as input to the model.
-
Healthcare Diagnosis- By providing images regarding a disease, a model can be trained to detect if a person is suffering from the disease or not.
-
Sentiment Analysis- This refers to the process of using algorithms to mine documents and determine whether they’re positive, neutral, or negative in sentiment.
-
Fraud Detection- Training the model to identify suspicious patterns, we can detect instances of possible fraud.
Q.13. What Are Unsupervised Machine Learning Techniques?
Answer:- There are two techniques used in unsupervised learning: clustering and association.
- Clustering- Clustering problems involve data to be divided into subsets. These subsets, also called clusters, contain data that are similar to each other. Different clusters reveal different details about the objects, unlike classification or regression.
- Association- In an association problem, we identify patterns of associations between different variables or items. For example, an e-commerce website can suggest other items for you to buy, based on the prior purchases that you have made, spending habits, items in your wish list, other customers’ purchase habits, and so on.
Q.14. What are collinearity and multicollinearity?
Answer:-
- Collinearity occurs when two predictor variables (e.g., x1 and x2) in a multiple regression have some correlation.
- Multicollinearity occurs when more than two predictor variables (e.g., x1, x2, and x3) are inter-correlated.
Q.15. What is Cluster Sampling?
Answer:-
- It is a process of randomly selecting intact groups within a defined population, sharing similar characteristics.
- Cluster Sample is a probability sample where each sampling unit is a collection or cluster of elements.
- For example, if you’re clustering the total number of managers in a set of companies, in that case, managers (samples) will represent elements and companies will represent clusters.
Q.16. What is the difference between Gini Impurity and Entropy in a Decision Tree?
Answer:-
- Gini Impurity and Entropy are the metrics used for deciding how to split a Decision Tree.
- Gini measurement is the probability of a random sample being classified correctly if you randomly pick a label according to the distribution in the branch.
- Entropy is a measurement to calculate the lack of information. You calculate the Information Gain (difference in entropies) by making a split. This measure helps to reduce the uncertainty about the output label.
Q.17. What’s the difference between Type I and Type II error?
Answer:-
Type I Error |
Type II Error |
Type I error is a false positive |
Type II error is a false negative |
Type I error is claiming something has happened when it hasn't |
Type II error is claiming nothing when in fact something has happened. |
Q.18. What do you understand by selection bias?
Answer:-
- It is a statistical error that causes a bias in the sampling portion of an experiment.
- The error causes one sampling group to be selected more often than other groups included in the experiment.
- Selection bias may produce an inaccurate conclusion if the selection bias is not identified.
Q.19.What is the "Curse of Dimensionality?"
Answer:- The difficulty of searching through a solution space becomes much harder as you have more features (dimensions).
Consider the analogy of looking for a penny in a line vs. a field vs. a building. The more dimensions you have, the higher volume of data you'll need.
Q.20. What are the advantages and disadvantages of decision trees?
Answer:- Advantages: Decision trees are easy to interpret, non-parametric (which means they are robust to outliers), and there are relatively few parameters to tune.
Disadvantages: Decision trees are prone to be over-fit. However, this can be addressed by ensemble methods like random forests or boosted trees.
For more Technical MCQ's and Interview Questions Click here