Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
Goeduhub's Online Courses @ Udemy in Just INR 570/-
Online Training - Youtube Live Class Link
0 like 0 dislike
4.1k views
in Python Programming by Goeduhub's Expert (2.2k points)
Web Scraping Project to analyze products from Flipkart. In this article we will cover, How to Scrape Data From Flipkart Website Using Python? How do you check if I can scrape a website? How can I scrape on Flipkart? Does flipkart allow web scraping?

Goeduhub's Top Online Courses @Udemy

For Indian Students- INR 360/- || For International Students- $9.99/-

S.No.

Course Name

 Coupon

1.

Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon

2.

Natural Language Processing-NLP with Deep Learning in Python Apply Coupon

3.

Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon
    More Courses

1 Answer

0 like 0 dislike
by Goeduhub's Expert (2.2k points)
selected by
 
Best answer

How to Scrap Data form Flipkart

We need to follow certain steps for data extraction

  1. Importing necessary libraries like BeautifulSoup, requests, Pandas, csv etc. 
  2. Find url that we want to extract.
  3. Inspect the page, we need to specify the content variable from html which we want to  extract.
  4. Writing code for scraping.
  5. Store the result in desired format.

Step 1  Importing necessary libraries

from bs4 import BeautifulSoup 

import requests 

import csv

import pandas as pd

Requests is a Python HTTP library.So, basically with the help of this library we make a request to a web page.

Step 2  Find url that we want to extract

In this example we want to extract data from flipkart website and will compare price and ratings of different laptops. URL of the Flipkart website containing laptops information is

https://www.flipkart.com/search?q=laptop&sid=6bo%2Cb5g&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_6_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_6_na_na_na&as-pos=1&as-type=RECENT&suggestionId=laptop%7CLaptops&requestId=7ec220e8-4f02-4150-9e0b-9e90cf692f4b&as-searchtext=laptop

To get the contents of the specified URL, submit a request using the requests library. 

url="https://www.flipkart.com/search?q=laptop&sid=6bo%2Cb5g&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_6_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_6_na_na_na&as-pos=1&as-type=RECENT&suggestionId=laptop%7CLaptops&requestId=7ec220e8-4f02-4150-9e0b-9e90cf692f4b&as-searchtext=laptop"

response = requests.get(url)

htmlcontent = response.content

soup = BeautifulSoup(htmlcontent,"html.parser")

print(soup.prettify)

  1.  Here we use Beautifulsoup to parse the HTML content with the help of html parser.
  2. I did not get the soup printed here and got the prettify soup printed. It shows the html page in a better way.
  3. Check your output in form of html code.

 flipkart site scraping with web scraping python

This is snapshot of flipkart page which contains different laptops information. We want to extract product name and product price and product ratings.

Step 3 Inspect the page, we need to specify the content variable from html which we want to  extract

Just right click on flipkart page and inspect elements and then select elements which you want to get.

inspect flipkart page for web scraping

We will see a “Browser Inspector Box” open after clicking on inspect. We observe that the class name of the descriptions is ‘_4rR01T’ so we use the find method to extract the descriptions of the laptops.

get element class by inspect

products=[]

prices=[]

ratings=[]

product=soup.find('div',attrs={'class':'_4rR01T'})

print(product.text)

Output-

HP 14s Core i3 10th Gen - (8 GB/256 GB SSD/Windows 10 Home) 14s-cf3074TU Thin and Light Laptop 

Here we are getting particular laptop description. For all laptop descriptions we need to write a loop.

Step 4 Writing code for scraping.

Get all the classes corresponding to price and rating and for complete product.

for a in soup.findAll('a',href=True, attrs={'class':'_1fQZEK'}):

  name=a.find('div',attrs={'class':'_4rR01T'})

  price=a.find('div',attrs={'class':'_30jeq3 _1_WHN1'})

  rating=a.find('div',attrs={'class':'_3LWZlK'})

  products.append(name.text)

  prices.append(price.text)

  ratings.append(rating.text)

Step 5 Store the result in desired format.

import pandas as pd

df = pd.DataFrame({'Product Name':products,'Prices':prices,'Ratings':ratings})

df.head()

Output-

Product Name Prices Ratings
0 HP 14s Core i3 10th Gen - (8 GB/256 GB SSD/Win... ₹36,990 4.2
1 HP 15 Ryzen 3 Dual Core 3200U - (4 GB/1 TB HDD... ₹29,990 4.1
2 Lenovo Ideapad S145 Core i3 7th Gen - (4 GB/1 ... ₹30,990 4.1
3 HP 15s Ryzen 5 Quad Core 3450U - (8 GB/1 TB HD... ₹40,990 4.2
4 Asus TUF Gaming A17 Ryzen 5 Hexa Core 4600H - ... ₹63,990 4.7

Store in a csv file-

df.to_csv('products.csv')

A file name “products.csv” is created and this file contains the extracted data. 

Ezoicreport this ad

3.3k questions

7.1k answers

394 comments

4.6k users

Related questions

0 like 0 dislike
5 answers 2k views
asked Apr 4, 2020 in Python Programming by Nisha Goeduhub's Expert (3.1k points)
1 like 0 dislike
3 answers 1.9k views
2 like 0 dislike
7 answers 7.5k views
2 like 0 dislike
4 answers 2.5k views
asked Dec 21, 2019 in Python Programming by Nisha Goeduhub's Expert (3.1k points)
0 like 0 dislike
4 answers 1.5k views
Ezoicreport this ad

 Goeduhub:

About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © goeduhub.com Social::   |  | 
...
We and our partners share information on your use of this website to help improve your experience.