Skip to content

Instantly share code, notes, and snippets.

@reidransom
Created June 29, 2026 00:45
Show Gist options
  • Select an option

  • Save reidransom/3d650a6e2d2cc6766b01095b499cb4ce to your computer and use it in GitHub Desktop.

Select an option

Save reidransom/3d650a6e2d2cc6766b01095b499cb4ce to your computer and use it in GitHub Desktop.
Research and document a WordPress website
name wpdoc
description Research and document a WordPress website. Crawls the public site HTML, logs into the WP admin via Playwright, and produces a comprehensive inventory of page types, content counts, plugins, media, frontend libraries, animations, and more. All scraped HTML and documentation are saved to a wpdocs/ subfolder in the current working directory. Use when asked to document a WordPress site, audit WP content, inventory a WP site, research a WordPress site before migration, catalogue what a WordPress site contains, or gather intel on a WP install.

WordPress Site Documenter

Research a WordPress website and save a complete inventory to wpdocs/ in the current working directory. This skill is research-only — it gathers and documents, it does not migrate or modify anything.

Output structure

Everything goes into wpdocs/ relative to the current working directory:

wpdocs/
├── html/                  # raw crawled HTML from the public site
│   └── <domain>/          # httrack mirror (HTML/CSS/JS only)
├── screenshots/           # admin dashboard screenshots
├── site-overview.md       # high-level summary of the site
├── pages-and-posts.md     # page types, post types, counts
├── plugins.md             # installed plugins with descriptions
├── theme.md               # active theme details
├── media.md               # uploaded media inventory
├── frontend-libraries.md  # CSS/JS frameworks and libraries
├── animations.md          # animations and interactive features
├── forms.md               # forms found on the site
└── headers.md             # HTTP response header notes

Prerequisites

Before starting, confirm the working directory contains a .env file with:

WP_LOGIN_PAGE=https://example.com/wp-login.php
WP_USERNAME=...
WP_PASSWORD=...

If .env is missing or incomplete, ask the user for credentials before proceeding. Never echo credentials in shell output.

Phase 1 — Crawl the Public Site

Create the output directory first:

mkdir -p wpdocs/html wpdocs/screenshots

Use httrack to mirror the site HTML (no images or media):

httrack "<site-url>" \
  -O "wpdocs/html" \
  "+*.<domain>/*" \
  "-*?*" \
  "-*.jpg" "-*.jpeg" "-*.png" "-*.gif" "-*.svg" "-*.webp" \
  "-*.mp4" "-*.mp3" "-*.woff" "-*.woff2" "-*.ttf" "-*.ico" \
  -v

Replace <site-url> with the public URL and <domain> with the bare domain (e.g. example.com).

After the crawl finishes, analyse the downloaded HTML to extract:

  1. Total page count — unique URLs crawled.
  2. Page type classification — inspect <body> class attributes, URL path patterns, and structural HTML to identify distinct page templates (e.g. home, about, blog listing, single post, portfolio item, landing page, contact, 404).
  3. Collection content — repeating URL patterns with similar HTML (blog posts, portfolio items, team members, products, testimonials).
  4. Frontend libraries — scan <link> and <script> tags across pages for CSS/JS frameworks (Bootstrap, Tailwind, Foundation, jQuery, React, Vue, GSAP, AOS, Slick, Swiper, Lottie, etc.). Note CDN URLs and local paths. Check for bundled/minified files and try to identify what they contain.
  5. Animations & interactive features — look for: CSS animations and transitions (keyframes, transition properties), JS animation libraries (GSAP, AOS, Animate.css, ScrollMagic, Lottie), parallax effects, scroll-triggered animations, hover effects, page transition effects, carousels/sliders, modals/lightboxes, accordions, tabs, sticky elements, lazy loading, infinite scroll, AJAX content loading.
  6. Forms — count <form> elements, note fields and likely purpose (contact, newsletter, search, login).
  7. Embedded content — iframes (maps, videos, calendars, social feeds), third-party widgets, chat widgets.
  8. CSS frameworks — identify any grid/utility frameworks in use.
  9. Meta & SEO — check for structured data (JSON-LD, microdata), Open Graph tags, canonical URLs, sitemap references.

Phase 2 — Scrape the WP Admin via Playwright

Use the Playwright MCP tools. Read .env to get credentials:

source .env

Then use Playwright MCP to visit each admin page below. For every page, take a screenshot (save to wpdocs/screenshots/) and a snapshot to extract text data.

Login

  1. Navigate to $WP_LOGIN_PAGE
  2. Fill username and password fields, click Log In

Dashboard

  1. Snapshot + screenshot the Dashboard home. Note:
    • WordPress version
    • At-a-glance widget: post count, page count, comment count
    • Any update notices or warnings
    • Site health status if shown

Plugins

  1. Navigate to /wp-admin/plugins.php — snapshot the full plugin list. For each plugin record:
    • Name
    • Version
    • Active or inactive
    • Whether an update is available
    • Brief description of what the plugin does (from the description shown on the page, or infer from the plugin name if not visible)

Theme

  1. Navigate to /wp-admin/themes.php — snapshot. Record:
    • Active theme name, version, author
    • Whether it's a custom, vendor, or child theme
    • Any inactive themes installed

Posts

  1. Navigate to /wp-admin/edit.php — snapshot. Record:
    • Total published posts
    • Draft count if visible
    • Categories and tags in use (navigate to /wp-admin/edit-tags.php?taxonomy=category and /wp-admin/edit-tags.php?taxonomy=post_tag if needed)

Pages

  1. Navigate to /wp-admin/edit.php?post_type=page — snapshot. Record:
    • Total published pages
    • Draft count if visible
    • List page titles and note which template each uses if visible

Custom Post Types

  1. Inspect the admin sidebar for menu items beyond Posts / Pages / Media / Comments. Navigate to each custom post type list page and record:
    • Post type name/slug
    • Number of published items
    • Sample of field names if visible (ACF, custom fields, etc.)

Media Library

  1. Navigate to /wp-admin/upload.php — snapshot. Record:
    • Total number of media items (shown in the media library header)
    • If possible, note the breakdown by type (images, documents, audio, video) by checking the filter dropdown
    • Get the total disk usage for all media

Users

  1. Navigate to /wp-admin/users.php — snapshot. Note:
    • Total user count
    • Roles in use (admin, editor, author, etc.)

Settings

  1. Navigate to /wp-admin/options-general.php — snapshot. Note:
    • Site title, tagline
    • WordPress address and site address URLs
    • Timezone, date/time format

Site Health

  1. Navigate to /wp-admin/site-health.php — snapshot. Note:
    • PHP version
    • Database version (MySQL/MariaDB)
    • Web server
    • Any critical issues or recommendations

Updates

  1. Navigate to /wp-admin/update-core.php — snapshot. Note:
    • Pending core updates
    • Pending plugin updates (count)
    • Pending theme updates (count)

If login fails or any page returns an error, report what is and isn't accessible and proceed with whatever data you have.

Phase 3 — HTTP Response Headers

Run:

curl -sI "<site-url>" > wpdocs/headers-raw.txt

Note anything interesting: server software, CDN/WAF headers (Cloudflare, Sucuri, etc.), caching headers, PHP version exposure, X-Powered-By, etc.

Phase 4 — Write Documentation

Using all the data gathered, write the following markdown files into wpdocs/. Each file should be clear, scannable, and factual — just report what you found. Use headings, lists, and counts. Don't editorialize.

site-overview.md

  • Site URL
  • Date of audit
  • WordPress version
  • PHP version
  • Server / hosting clues (from headers)
  • Active theme (name, version, type)
  • Total pages, total posts, total custom post type items
  • Total plugins (active / inactive)
  • Total media files
  • Number of forms found
  • Number of distinct page templates identified
  • Quick note on overall site complexity (simple / moderate / complex)

pages-and-posts.md

For each content type (pages, posts, each custom post type):

  • Type name
  • Total count (published / draft)
  • Sample titles (up to 10)
  • URL pattern
  • Template used (if identifiable)
  • Notable fields or structured data

Also include a section on page templates — list each distinct template found from the HTML crawl with a description of its layout and which pages use it.

plugins.md

A list of every plugin, formatted as:

## Plugin Name (vX.X.X) — Active/Inactive
Brief description of what the plugin does.
Update available: Yes/No

Group plugins by category where it makes sense:

  • SEO
  • Security
  • Performance / Caching
  • Forms
  • Page builders
  • E-commerce
  • Media / Gallery
  • Social
  • Analytics
  • Backup
  • Other / Utility

theme.md

  • Active theme name, version, author, URL
  • Parent theme (if child theme)
  • Customiser settings observed
  • Notable template files or custom functionality
  • Any theme-specific plugins or requirements

media.md

  • Total media file count
  • Breakdown by type if available (images, documents, video, audio)
  • Estimated volume (if inferable from media library pagination)
  • Notable observations (e.g. very large library, unoptimised images)

frontend-libraries.md

List every CSS and JS library/framework detected:

## Library Name (vX.X.X)
- Source: CDN / local / bundled
- URL or file path
- Purpose: what it does on the site

Include: CSS frameworks, JS frameworks, animation libraries, utility libraries, font loading, icon sets (Font Awesome, etc.).

animations.md

Document every animation or interactive behaviour found:

  • CSS animations/transitions (describe the effect and which elements)
  • JS-driven animations (library used, trigger, effect)
  • Scroll-triggered effects
  • Parallax layers
  • Hover states (beyond basic color changes)
  • Page/route transitions
  • Loading animations
  • Carousels/sliders (library, number of instances)
  • Modals/lightboxes
  • Accordions, tabs, toggles
  • Any other notable interactive UI

forms.md

For each form found:

  • Location (which page)
  • Likely purpose (contact, newsletter, search, login, registration)
  • Fields list (name, email, phone, message, file upload, etc.)
  • Submission handler (Contact Form 7, Gravity Forms, WPForms, custom, mailto, etc.)
  • Notable features (CAPTCHA, conditional fields, multi-step)

headers.md

Paste the raw response headers and annotate anything notable:

  • Server software
  • CDN / WAF / proxy
  • Caching behaviour
  • Security headers (CSP, HSTS, X-Frame-Options, etc.)
  • PHP version exposure
  • Any custom headers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment