Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
0 like 0 dislike
2.2k views
in Python Programming by Goeduhub's Expert (3.1k points)
edited by
What is Web Scraping with python? web-scraping python beautifulsoup
by (100 points)
Going to use it and try from now on for sure.

Goeduhub's Top Online Courses @Udemy

For Indian Students- INR 360/- || For International Students- $9.99/-

S.No.

Course Name

 Coupon

1.

Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon

2.

Natural Language Processing-NLP with Deep Learning in Python Apply Coupon

3.

Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon
    More Courses

5 Answers

0 like 0 dislike
by Goeduhub's Expert (3.1k points)
edited by
 
Best answer

What is Web Scraping

Extracting the data (Basically Html Files) of any website is called web scraping.This saves us a lot of time because we get a lot of data (huge amount of data) in structured form

As we already know if we want to use data in programming it should be in structure form, If not we have to make it in structure form,

How to Scrap Data form any Website

We can extract the data of any website in two ways.

  1. By using API of a website. Some websites like Twitter, Facebook , youtube have an API. By using API of a website we can extract data of the website.
  2. But all Websites do not have their API.In this situation we extract data of websites by accessing their HTML page, it is called web scraping.

Web Scraping using python

To scrap data from Python we have to install some libraries

  1. requests
  2. html5lib
  3. bs4

You can install them from pip install in python. We will see where they will be used.

Let's start with an example 

#Importing libraries 

import requests

import urllib.request

from bs4 import BeautifulSoup

Note

  1. Requests is a Python HTTP library.So, basically with the help of this library we make a request to a web page.
  2. Beautifulsoup is a tool for web scraping that helps you clean up and parse the documents you have pulled down from the web. See the official website of Beautifulsoup Documentation.

#Get the HTML content 

url="https://goeduhub.com/"

response= requests.get(url)

print(response)

htmlcontent=response.content

#print(htmlcontent)

Output

 

Note

  1. In this code, we have firstly taken a website whose data we want to extract and request its url with help of request module of python.
  2. After that we printed response (a variable) ,If you get the output like the above output means your request has been accessed.
  3. I have commented the HTML content here because it will print the entire HTML file of the site here which was very big. If you want you can give a try to it.

#Parse the html content 

soup = BeautifulSoup(htmlcontent, "html.parser")

#print(soup)

print(soup.prettify )

Output 

Note

  1.  Here we use Beautifulsoup to parse the HTML content with the help of html parser.
  2. I did not get the soup printed here and got the prettify soup printed. It shows the html page in a better way.
  3. I have shown a little sanpshot above in output, take a look.

#priting title of web page

print(soup.findAll('title'))

#It will give all the div in html page

print(soup.findAll('div'))

#It will give all the text files in the html page 

print(soup.findAll('text'))

#It will give an anchor tag of number fourth in web page

one_tag = soup.findAll('a')[4]

print(one_tag)

Output1 (title of page)

 

Output2 (div of page)

Output3 (text in page) 

 Output4 (anchor of page)

Note 

  1. If we want to access HTML file, then we can do it only with accessing  HTML  tags. 
  2. As I have done in the code above where some of the html tags are accessed.
  3. If you look at the third output carefully, there is an empty list. Meaning there is no text in our web page. And all the tags we get form beautifulsoup are in a list format.
  4. We know how to access the list in Python, this is what we have shown in our fourth output.

#Alternate way for accessing tags 

#Get the title of html page

title=soup.title

print(title)

Output

Note 

In similar way we can also access other html tags. But the problem in this is that can access only one tag at a  time in a web page (Generally the first one). 

HTML Tree 

The HTML Document Tree

Note: This is just i want to show you  how the tags of HTML are connected to each other. We can also use it to access the html tags.

#Get all the links form html page 

anchor= soup.findAll('a')

all_link=set()

for link in anchor:

    if (link.get('href') != '#'):

       linkprint="https://goeduhub.com/"+link.get('href')

        all_link.add(link)

        print(linkprint)

Output

Note 

  1. Here we are trying to find out all the links in the URL that we have taken here.
  2. Now you must be thinking that we can do it with findall beautifulsoup method.Yes, but we only see the links, we cannot click on the links and go to the page (actual website)
  3. Here we first store the links in a variable named anchor.
  4. After that we put a loop on it, as we know many times we use instead of the link, we put a condition here to avoid empty links.
  5. After that we add each link to the original url.
  6. Here we use python set instead of python list to avoid repetition of a link. (Unique value concept).
  7. When you run this code, all the links in the output are working and you can go to the page by clicking on them.

 Click here for- Scraping data from live Flipkart Website-a Example

Machine Learning Tutorial

Free Online Tutorials 

Artificial Intelligence(AI) Training in Jaipur 

Machine Learning(ML) Training in Jaipur  

by (100 points)
In this article, I learned about web scrapping.
Web scrapping is a method to extract data from a website. There are some libraries that are used in web scrapping in python:
  requests
  html5lib
  bs4
0 like 0 dislike
by (592 points)
edited by

24/04/2020  

Why Python for Web Scraping?

You’ve probably heard of how awesome Python is. But, so are other languages too. Then why should we choose Python over other languages for web scraping?

Here is the list of features of Python which makes it more suitable for web scraping.

  • Ease of Use: Python is simple to code. You do not have to add semi-colons “;” or curly-braces “{}” anywhere. This makes it less messy and easy to use.
  • Large Collection of Libraries: Python has a huge collection of libraries such as Numpy, Matlplotlib, Pandas etc., which provides methods and services for various purposes. Hence, it is suitable for web scraping and for further manipulation of extracted data.
  • Dynamically typed: In Python, you don’t have to define datatypes for variables, you can directly use the variables wherever required. This saves time and makes your job faster.
  • Easily Understandable Syntax: Python syntax is easily understandable mainly because reading a Python code is very similar to reading a statement in English. It is expressive and easily readable, and the indentation used in Python also helps the user to differentiate between different scope/blocks in the code. 
    • Small code, large task: Web scraping is used to save time. But what’s the use if you spend more time writing the code? Well, you don’t have to. In Python, you can write small codes to do large tasks. Hence, you save time even while writing the code.
    • Community: What if you get stuck while writing the code? You don’t have to worry. Python community has one of the biggest and most active communities, where you can seek help from.

    How does Web Scraping work?

    When you run the code for web scraping, a request is sent to the URL that you have mentioned. As a response to the request, the server sends the data and allows you to read the HTML or XML page. The code then, parses the HTML or XML page, finds the data and extracts it. 

    To extract data using web scraping with python, you need to follow these basic steps:

    1. Find the URL that you want to scrape
    2. Inspecting the Page
    3. Find the data you want to extract
    4. Write the code
    5. Run the code and extract the data
    6. Store the data in the required format 
    by (294 points)
    In this chapter I have learned that how to extract data from any website
    by (422 points)
    In above article ,we get to know that web scraping is used to extract data from website. Also we learnt  how to import libraries and how python get work on web scraping.
    0 like 0 dislike
    by (391 points)

    Date: 02/05/2020

    Topic: Web Scraping

    i have learned how to extract the data from particular website. with the web scraping we can have a structured data. for that some libraries need to be install.

    0 like 0 dislike
    by (232 points)
    Topic- web scraping

    i learned about web scraping in python, methods of extracting any data file in website
    0 like 0 dislike
    by (294 points)
    In this chapter I learned about the web scraping and how to extract data from any website

    3.3k questions

    7.1k answers

    394 comments

    4.6k users

    Related questions

    0 like 0 dislike
    2 answers 349 views
    0 like 0 dislike
    1 answer 512 views
    0 like 0 dislike
    2 answers 740 views
    0 like 0 dislike
    1 answer 424 views
    asked Jan 31, 2020 in Python Programming by Nisha Goeduhub's Expert (3.1k points)

     Goeduhub:

    About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © goeduhub.com Social::   |  | 
    ...