Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

scrapping components of webpage

# importing the libraries
2from bs4 import BeautifulSoup
3import requests
4import csv
5
6
7# Step 1: Sending a HTTP request to a URL
8url = "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)"
9# Make a GET request to fetch the raw HTML content
10html_content = requests.get(url).text
11
12
13# Step 2: Parse the html content
14soup = BeautifulSoup(html_content, "lxml")
15# print(soup.prettify()) # print the parsed data of html
16
17
18# Step 3: Analyze the HTML tag, where your content lives
19# Create a data dictionary to store the data.
20data = {}
21#Get the table having the class wikitable
22gdp_table = soup.find("table", attrs={"class": "wikitable"})
23gdp_table_data = gdp_table.tbody.find_all("tr")  # contains 2 rows
24
25# Get all the headings of Lists
26headings = []
27for td in gdp_table_data[0].find_all("td"):
28    # remove any newlines and extra spaces from left and right
29    headings.append(td.b.text.replace('
', ' ').strip())
30
31# Get all the 3 tables contained in "gdp_table"
32for table, heading in zip(gdp_table_data[1].find_all("table"), headings):
33    # Get headers of table i.e., Rank, Country, GDP.
34    t_headers = []
35    for th in table.find_all("th"):
36        # remove any newlines and extra spaces from left and right
37        t_headers.append(th.text.replace('
', ' ').strip())
38    
39    # Get all the rows of table
40    table_data = []
41    for tr in table.tbody.find_all("tr"): # find all tr's from table's tbody
42        t_row = {}
43        # Each table row is stored in the form of
44        # t_row = {'Rank': '', 'Country/Territory': '', 'GDP(US$million)': ''}
45
46        # find all td's(3) in tr and zip it with t_header
47        for td, th in zip(tr.find_all("td"), t_headers): 
48            t_row[th] = td.text.replace('
', '').strip()
49        table_data.append(t_row)
50
51    # Put the data for the table with his heading.
52    data[heading] = table_data
53
54
55# Step 4: Export the data to csv
56"""
57For this example let's create 3 seperate csv for 
583 tables respectively
59"""
60for topic, table in data.items():
61    # Create csv file for each table
62    with open(f"{topic}.csv", 'w') as out_file:
63        # Each 3 table has headers as following
64        headers = [ 
65            "Country/Territory",
66            "GDP(US$million)",
67            "Rank"
68        ] # == t_headers
69        writer = csv.DictWriter(out_file, headers)
70        # write the header
71        writer.writeheader()
72        for row in table:
73            if row:
74                writer.writerow(row)
Comment

PREVIOUS NEXT
Code Example
Python :: dataset to list python 
Python :: Using pushbullet to export whatsapp chat 
Python :: percent change pandas using log 
Python :: Comparing Sets with isdisjoint() Function in python 
Python :: Convert PySpark RDD to DataFrame 
Python :: box plot seaborn advance python 
Python :: Calculate summary statistics across columns 
Python :: pygame lerp 
Python :: python keyword search engine 
Python :: yml file for django 
Python :: flask-sqlalchemy inheritance 
Python :: python csv file plot column 
Python :: pandas add mutliple columns 
Python :: Generating variations on image data 
Python :: airflow get execution context dictionary kubernetes pod name 
Python :: pyPS4Controller usage 
Python :: calculating expressions with sqrt signs 
Python :: what is self in python constructor 
Python :: python Find Hash 
Python :: python array to text 
Python :: get random bright hex color python 
Python :: machine earning to predict sentimentanalysis python 
Python :: odoo site map for employees hierarchy 
Python :: get_scholarly_instance() 
Python :: zoom in geopandas polot 
Python :: pyqt message box set detailed text 
Python :: dashes in python packages 
Python :: how to call a function in python? 
Python :: python @property decorator 
Python :: With Python, it is possible to use the ** operator to calculate powers 
ADD CONTENT
Topic
Content
Source link
Name
9+9 =