In this blog post, i will explain how to get only the text from a web page using Beautiful Soup 4. In Beautiful Soup 4, we have get_text() method which can be used to get all the text information from a web page.  So consider my blog itself.  If i want to get all the text within this blog, i can use the below code.

from bs4 import BeautifulSoup
import urllib2
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)

In the above code we have created the soup object based on the URL To get all the text stored within the page we can use the below code.

all_texts = soup.get_text()
print all_texts

In The above code get_text() is used to get all the text content within the page.

Sample Output

Getting All Text From Web Page Using Beautiful Soup «

( function() {
var query =;

if ( query && query.indexOf( 'preview=true' ) !== -1 ) { = 'wp-preview-387';

if ( window.addEventListener ) {
window.addEventListener( 'unload', function() { = ''; }, false );

But wait, are we seeing any script tags and contents within the output ? How can we remove those ?