| name | wpdoc |
|---|---|
| description | Research and document a WordPress website. Crawls the public site HTML, logs into the WP admin via Playwright, and produces a comprehensive inventory of page types, content counts, plugins, media, frontend libraries, animations, and more. All scraped HTML and documentation are saved to a wpdocs/ subfolder in the current working directory. Use when asked to document a WordPress site, audit WP content, inventory a WP site, research a WordPress site before migration, catalogue what a WordPress site contains, or gather intel on a WP install. |
Research a WordPress website and save a complete inventory to wpdocs/
in the current working directory. This skill is research-only — it gathers
and documents, it does not migrate or modify anything.
Everything goes into wpdocs/ relative to the current working directory:
wpdocs/
├── html/ # raw crawled HTML from the public site
│ └── <domain>/ # httrack mirror (HTML/CSS/JS only)
├── screenshots/ # admin dashboard screenshots
├── site-overview.md # high-level summary of the site
├── pages-and-posts.md # page types, post types, counts
├── plugins.md # installed plugins with descriptions
├── theme.md # active theme details
├── media.md # uploaded media inventory
├── frontend-libraries.md # CSS/JS frameworks and libraries
├── animations.md # animations and interactive features
├── forms.md # forms found on the site
└── headers.md # HTTP response header notes
Before starting, confirm the working directory contains a .env file with:
WP_LOGIN_PAGE=https://example.com/wp-login.php
WP_USERNAME=...
WP_PASSWORD=...
If .env is missing or incomplete, ask the user for credentials before
proceeding. Never echo credentials in shell output.
Create the output directory first:
mkdir -p wpdocs/html wpdocs/screenshotsUse httrack to mirror the site HTML (no images or media):
httrack "<site-url>" \
-O "wpdocs/html" \
"+*.<domain>/*" \
"-*?*" \
"-*.jpg" "-*.jpeg" "-*.png" "-*.gif" "-*.svg" "-*.webp" \
"-*.mp4" "-*.mp3" "-*.woff" "-*.woff2" "-*.ttf" "-*.ico" \
-vReplace <site-url> with the public URL and <domain> with the bare
domain (e.g. example.com).
After the crawl finishes, analyse the downloaded HTML to extract:
- Total page count — unique URLs crawled.
- Page type classification — inspect
<body>class attributes, URL path patterns, and structural HTML to identify distinct page templates (e.g. home, about, blog listing, single post, portfolio item, landing page, contact, 404). - Collection content — repeating URL patterns with similar HTML (blog posts, portfolio items, team members, products, testimonials).
- Frontend libraries — scan
<link>and<script>tags across pages for CSS/JS frameworks (Bootstrap, Tailwind, Foundation, jQuery, React, Vue, GSAP, AOS, Slick, Swiper, Lottie, etc.). Note CDN URLs and local paths. Check for bundled/minified files and try to identify what they contain. - Animations & interactive features — look for: CSS animations and transitions (keyframes, transition properties), JS animation libraries (GSAP, AOS, Animate.css, ScrollMagic, Lottie), parallax effects, scroll-triggered animations, hover effects, page transition effects, carousels/sliders, modals/lightboxes, accordions, tabs, sticky elements, lazy loading, infinite scroll, AJAX content loading.
- Forms — count
<form>elements, note fields and likely purpose (contact, newsletter, search, login). - Embedded content — iframes (maps, videos, calendars, social feeds), third-party widgets, chat widgets.
- CSS frameworks — identify any grid/utility frameworks in use.
- Meta & SEO — check for structured data (JSON-LD, microdata), Open Graph tags, canonical URLs, sitemap references.
Use the Playwright MCP tools. Read .env to get credentials:
source .envThen use Playwright MCP to visit each admin page below. For every page,
take a screenshot (save to wpdocs/screenshots/) and a snapshot
to extract text data.
- Navigate to
$WP_LOGIN_PAGE - Fill username and password fields, click Log In
- Snapshot + screenshot the Dashboard home. Note:
- WordPress version
- At-a-glance widget: post count, page count, comment count
- Any update notices or warnings
- Site health status if shown
- Navigate to
/wp-admin/plugins.php— snapshot the full plugin list. For each plugin record:- Name
- Version
- Active or inactive
- Whether an update is available
- Brief description of what the plugin does (from the description shown on the page, or infer from the plugin name if not visible)
- Navigate to
/wp-admin/themes.php— snapshot. Record:- Active theme name, version, author
- Whether it's a custom, vendor, or child theme
- Any inactive themes installed
- Navigate to
/wp-admin/edit.php— snapshot. Record:- Total published posts
- Draft count if visible
- Categories and tags in use (navigate to
/wp-admin/edit-tags.php?taxonomy=categoryand/wp-admin/edit-tags.php?taxonomy=post_tagif needed)
- Navigate to
/wp-admin/edit.php?post_type=page— snapshot. Record:- Total published pages
- Draft count if visible
- List page titles and note which template each uses if visible
- Inspect the admin sidebar for menu items beyond Posts / Pages /
Media / Comments. Navigate to each custom post type list page and
record:
- Post type name/slug
- Number of published items
- Sample of field names if visible (ACF, custom fields, etc.)
- Navigate to
/wp-admin/upload.php— snapshot. Record:- Total number of media items (shown in the media library header)
- If possible, note the breakdown by type (images, documents, audio, video) by checking the filter dropdown
- Get the total disk usage for all media
- Navigate to
/wp-admin/users.php— snapshot. Note:- Total user count
- Roles in use (admin, editor, author, etc.)
- Navigate to
/wp-admin/options-general.php— snapshot. Note:- Site title, tagline
- WordPress address and site address URLs
- Timezone, date/time format
- Navigate to
/wp-admin/site-health.php— snapshot. Note:- PHP version
- Database version (MySQL/MariaDB)
- Web server
- Any critical issues or recommendations
- Navigate to
/wp-admin/update-core.php— snapshot. Note:- Pending core updates
- Pending plugin updates (count)
- Pending theme updates (count)
If login fails or any page returns an error, report what is and isn't accessible and proceed with whatever data you have.
Run:
curl -sI "<site-url>" > wpdocs/headers-raw.txtNote anything interesting: server software, CDN/WAF headers (Cloudflare, Sucuri, etc.), caching headers, PHP version exposure, X-Powered-By, etc.
Using all the data gathered, write the following markdown files into
wpdocs/. Each file should be clear, scannable, and factual — just
report what you found. Use headings, lists, and counts. Don't editorialize.
- Site URL
- Date of audit
- WordPress version
- PHP version
- Server / hosting clues (from headers)
- Active theme (name, version, type)
- Total pages, total posts, total custom post type items
- Total plugins (active / inactive)
- Total media files
- Number of forms found
- Number of distinct page templates identified
- Quick note on overall site complexity (simple / moderate / complex)
For each content type (pages, posts, each custom post type):
- Type name
- Total count (published / draft)
- Sample titles (up to 10)
- URL pattern
- Template used (if identifiable)
- Notable fields or structured data
Also include a section on page templates — list each distinct template found from the HTML crawl with a description of its layout and which pages use it.
A list of every plugin, formatted as:
## Plugin Name (vX.X.X) — Active/Inactive
Brief description of what the plugin does.
Update available: Yes/No
Group plugins by category where it makes sense:
- SEO
- Security
- Performance / Caching
- Forms
- Page builders
- E-commerce
- Media / Gallery
- Social
- Analytics
- Backup
- Other / Utility
- Active theme name, version, author, URL
- Parent theme (if child theme)
- Customiser settings observed
- Notable template files or custom functionality
- Any theme-specific plugins or requirements
- Total media file count
- Breakdown by type if available (images, documents, video, audio)
- Estimated volume (if inferable from media library pagination)
- Notable observations (e.g. very large library, unoptimised images)
List every CSS and JS library/framework detected:
## Library Name (vX.X.X)
- Source: CDN / local / bundled
- URL or file path
- Purpose: what it does on the site
Include: CSS frameworks, JS frameworks, animation libraries, utility libraries, font loading, icon sets (Font Awesome, etc.).
Document every animation or interactive behaviour found:
- CSS animations/transitions (describe the effect and which elements)
- JS-driven animations (library used, trigger, effect)
- Scroll-triggered effects
- Parallax layers
- Hover states (beyond basic color changes)
- Page/route transitions
- Loading animations
- Carousels/sliders (library, number of instances)
- Modals/lightboxes
- Accordions, tabs, toggles
- Any other notable interactive UI
For each form found:
- Location (which page)
- Likely purpose (contact, newsletter, search, login, registration)
- Fields list (name, email, phone, message, file upload, etc.)
- Submission handler (Contact Form 7, Gravity Forms, WPForms, custom, mailto, etc.)
- Notable features (CAPTCHA, conditional fields, multi-step)
Paste the raw response headers and annotate anything notable:
- Server software
- CDN / WAF / proxy
- Caching behaviour
- Security headers (CSP, HSTS, X-Frame-Options, etc.)
- PHP version exposure
- Any custom headers