Validate website HTML & CSS, check links & resources

A command-line website validator for Linux, Mac & Windows, which can spider through a website, validating all HTML & CSS pages, check the existence of all assets (images, CSS, fonts etc), and verify outbound links.

Features

Check a single URL, to a certain depth, or an entire website
HTML & CSS validation using (default) the Nu Html Checker
Detect & check linked assets from HTML & linked CSS (fonts, favicons, images, videos, etc)
Detect mixed content (HTTPS => HTTP) for linked assets (fonts, images, CSS, JS etc)
Verify outbound links (to external websites)
Summary report of errors (& optionally HTML/CSS warnings)
Multiple output formats: text, json, csv, html
Seed URLs from sitemap.xml
Skip specific domains (and subdomains) from validation
Obeys robots.txt (can be ignored)

Usage options

Usage: web-validator [options] <url>

Options:
  -a, --all                        recursive, follow all internal links (default single URL)
  -d, --depth int                  crawl depth ("-a" will override this)
  -o, --outbound                   check outbound links (HEAD only)
      --html                       validate HTML
      --css                        validate CSS
  -i, --ignore string              ignore URLs, comma-separated, wildcards allowed (*.jpg,example.com)
      --skip-domains string        skip domains (and subdomains), comma-separated (default "linkedin.com,google.com,cloudflare.com")
  -s, --sitemap                    seed URLs from /sitemap.xml (silently skipped if not found)
  -n, --no-robots                  ignore robots.txt (if exists)
  -r, --redirects                  treat redirects as errors
  -w, --warnings                   display validation warnings (default errors only)
  -f, --full                       full scan (same as "-a -r -o --html --css")
      --output string              output format: text, json, csv, html (default "text")
      --crawl-delay duration       delay between crawl requests, e.g. 500ms, 1s
      --validator-delay duration   delay between validator requests, e.g. 500ms, 1s (default 1s)
  -t, --threads int                number of threads (default 5)
      --timeout int                timeout in seconds (default 10)
      --validator string           Nu Html validator (default "https://validator.w3.org/nu/")
  -u, --update                     update to latest release
  -v, --version                    show app version

Examples

web-validator https://example.com/ - scan URL, verify all direct assets & links
web-validator https://example.com/ --css --html - scan URL, verify all direct assets & links, validate HTML & CSS
web-validator https://example.com/ -a - scan entire site, verify assets & links
web-validator https://example.com/ --css --html -d 2 - scan site to a depth of 2 internal links, verify assets & links, validate HTML and CSS
web-validator https://example.com/ -a -o - scan entire site, verify all assets, verify outbound links
web-validator https://example.com/ -f - scan entire site, verify all assets, verify outbound links, validate HTML & CSS

Installing

Download the latest binary release for your system, or build from source go install github.com/axllent/web-validator@latest (go required)

FAQ

When I scan a single page, web-validator scans many other pages too

When scanning a page, the software will check all internal links from that single page, which include both pages and files. Only a HEAD request is done on linked pages (no validation etc) to check for a valid response.

Web-validator says some of my outbound links are broken, however they do work

Some sites specifically block all HEAD requests, in which case web-validator will try a regular GET request. Some sites however go to extreme lengths to prevent any kind of scraping, such as LinkedIn, so these will always return an error response. The application already skips several problematic domains using the --skip-domains flag (linkedin.com, google.com, cloudflare.com included by default, along with all subdomains).

HTML/CSS validation

Validation uses the Nu Html validator, and by default uses the online public service (they encourage this). You can however use your own instance of the validator (open source), and use the --validator <your-server> to specify your own.

Robots.txt

By default, web-validator obeys Disallow rules in robots.txt if it exists. You can optionally skip this by adding -n to your runtime flags. To add specific rules for just the validator, you can target it specifically with User-agent: web-validator, eg:

User-agent: web-validator
Disallow: /assets/Products/*

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github		.github
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
parser.go		parser.go
report.go		report.go
robotstxt.go		robotstxt.go
robotstxt_test.go		robotstxt_test.go
sitemap.go		sitemap.go
utils.go		utils.go
utils_test.go		utils_test.go
validator.go		validator.go
validator_test.go		validator_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Validate website HTML & CSS, check links & resources

Features

Usage options

Examples

Installing

FAQ

When I scan a single page, web-validator scans many other pages too

Web-validator says some of my outbound links are broken, however they do work

HTML/CSS validation

Robots.txt

About

Uh oh!

Releases 16

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Validate website HTML & CSS, check links & resources

Features

Usage options

Examples

Installing

FAQ

When I scan a single page, web-validator scans many other pages too

Web-validator says some of my outbound links are broken, however they do work

HTML/CSS validation

Robots.txt

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages