Earworm
New 10th anniversary edition of my acclaimed novel about what happens when AI and the music business collide.
For issue 144 of The MagPi magazine, I created a productivity tool that downloads webpages for you and puts them into an office document you can easily search or edit.
This program can be a real time saver if you need to process a lot of online research. You enter a list of website addresses, and it downloads them all and puts them into a docx file you can open in Microsoft Word or LibreOffice. (I'm sure you're already familiar with one of these tools, but, if not, you can find a guide to Word in Microsoft Office for the Older and Wiser, and a guide to LibreOffice in Raspberry Pi For Dummies.)
The results aren't perfect: there are no images and there are often unnecessary navigation elements in the document. But, you can easily delete anything you don't need, and the real power is how quickly you can skim-read or search across multiple web pages.
The program shows you:
On this webpage, you can download the code for the project.
For more information on how the code works, get issue 144 of The MagPi.
You'll need to install the bs4 and python-docx libraries.
In the Thonny Python editor:
# Download web pages into a docx file
# By Sean McManus - www.sean.co.uk
import requests, sys
from bs4 import BeautifulSoup
from docx import Document
print("Paste in the URLs (Ctrl-D to end input): ")
urls = sys.stdin.readlines()
urls = [url.strip() for url in urls]
filename = "output.docx"
doc = Document()
for source_number, url in enumerate(urls):
print(f"Fetching {url}")
response = requests.get(url)
content = response.content
soup = BeautifulSoup(content, "html.parser")
for remove_me in soup.find_all(["nav", "footer"]):
remove_me.extract()
doc.add_heading(f"{source_number + 1} - {url}", 1)
title = soup.title.string
doc.add_heading(f"{source_number + 1} - {title}", 0)
for part in soup.find_all(["p", "h1", "h2", "h3", "h4", "h5", "h6", "table", "li", "blockquote"]):
if part.name in ["h1", "h2", "h3"]:
doc.add_heading(part.text, 2)
elif part.name == "li":
doc.add_paragraph(part.text, style="List Bullet")
elif part.text:
doc.add_paragraph(part.text)
doc.add_page_break()
doc.save(filename)
print(f"Saved as {filename}")
© Sean McManus. All rights reserved.
Visit www.sean.co.uk for free chapters from Sean's coding books (including Mission Python, Scratch Programming in Easy Steps and Coder Academy) and more!
New 10th anniversary edition of my acclaimed novel about what happens when AI and the music business collide.
A free 100-page ebook collecting my projects and tutorials for Raspberry Pi, micro:bit, Scratch and Python.
Web Design in Easy Steps, now in its 7th Edition, shows you how to make effective websites that work on any device.
Power up your Microsoft Excel skills with this powerful pocket-sized book of tips that will save you time and help you learn more from your spreadsheets.
This book, now fully updated for Scratch 3, will take you from the basics of the Scratch language into the depths of its more advanced features. A great way to start programming.
Code a space adventure game in this Python programming book published by No Starch Press.