Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

extract all text from website using beautifulsoup and python

from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request


def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True


def text_from_html(body):
    soup = BeautifulSoup(body, 'html.parser')
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)  
    return u" ".join(t.strip() for t in visible_texts)

html = urllib.request.urlopen('http://www.nytimes.com/2009/12/21/us/21storm.html').read()
print(text_from_html(html))
Comment

PREVIOUS NEXT
Code Example
Python :: django update request.post 
Python :: python square a number 
Python :: log loss python 
Python :: python readlines end of file 
Python :: how to bold in colorama 
Python :: most popular python libraries 
Python :: Visualize Decision Tree 
Python :: python script in excel 
Python :: pyautogui tab key 
Python :: list to dict python with same values 
Python :: python rock paper scissors 
Python :: python file to list 
Python :: extract a column from a dataframe in python 
Python :: how to drop column where target column is null 
Python :: radiobuttons django 
Python :: python check phone number 
Python :: increment python 
Python :: discord py message link 
Python :: how to append two numpy arrays 
Python :: pandas .nlargest 
Python :: how to create python environment 
Python :: python compare objects 
Python :: add new row to dataframe pandas 
Python :: how to find permutation of numbers in python 
Python :: gradient boosting regressor 
Python :: iterating through a list in python 
Python :: how to learn python 
Python :: apyori 
Python :: django oauth toolkit permanent access token 
Python :: how to check if a string contains a word python 
ADD CONTENT
Topic
Content
Source link
Name
9+3 =