Python

Search

extract all text from website using beautifulsoup and python

from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request


def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True


def text_from_html(body):
    soup = BeautifulSoup(body, 'html.parser')
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)  
    return u" ".join(t.strip() for t in visible_texts)

html = urllib.request.urlopen('http://www.nytimes.com/2009/12/21/us/21storm.html').read()
print(text_from_html(html))

Comment

PREVIOUS	NEXT

Code Example
Python :: django update request.post
Python :: python square a number
Python :: log loss python
Python :: python readlines end of file
Python :: how to bold in colorama
Python :: most popular python libraries
Python :: Visualize Decision Tree
Python :: python script in excel
Python :: pyautogui tab key
Python :: list to dict python with same values
Python :: python rock paper scissors
Python :: python file to list
Python :: extract a column from a dataframe in python
Python :: how to drop column where target column is null
Python :: radiobuttons django
Python :: python check phone number
Python :: increment python
Python :: discord py message link
Python :: how to append two numpy arrays
Python :: pandas .nlargest
Python :: how to create python environment
Python :: python compare objects
Python :: add new row to dataframe pandas
Python :: how to find permutation of numbers in python
Python :: gradient boosting regressor
Python :: iterating through a list in python
Python :: how to learn python
Python :: apyori
Python :: django oauth toolkit permanent access token
Python :: how to check if a string contains a word python

Search

PYTHON

extract all text from website using beautifulsoup and python

ADD CONTENT