A ZSH script that downloads web content, strips formatting, and converts it to plain text.
web2context is a command-line tool that:
- Downloads a webpage and its immediate child pages
- Removes all JavaScript, CSS, and styling
- Converts HTML content to plain text
- Combines all text into a single directory
- Creates an aggregate file (ALL.txt) containing all text content
- ZSH shell
- wget
- sed
- lynx (optional, but recommended for better HTML to text conversion)
- Download the script:
curl -O https://raw.githubusercontent.com/yourusername/web2context/main/web2context