Redirect Link Remediation Script¶
Overview¶
update_redirects.py is an autonomous Python script that updates redirected URLs in Markdown documentation files. It identifies HTTP redirects from Sphinx’s linkcheck output, validates the target URLs, and performs safe bulk replacements across the documentation.
Purpose¶
When documentation contains URLs that redirect (HTTP 301, 302, etc.), this creates unnecessary HTTP overhead and potential maintenance issues. This script:
Identifies all redirected URLs from linkcheck output
Validates that target URLs are not broken
Updates Markdown files with the final destination URLs
Verifies all changes resulted in valid links
Generates comprehensive audit reports
Requirements¶
Python 3.6+
Standard library only (no external dependencies)
A Sphinx documentation project with
make linkchecksupportMarkdown (
.md) source files
Usage¶
Ensure the update-redirects.yml file is in your .github/workflows directory.
Basic Workflow¶
Assuming that your docs are located in a docs/ dir at the root of your project, and the update_redirects.py file is also at root:
Generate initial linkcheck report
cd docs make linkcheck > linkcheck.txt cd ..
Run the script
python3 update_redirects.py
The script will:
Parse linkcheck output
Update all Markdown files
Run validation linkcheck
Generate reports
Clean up temporary files
Manual Step-by-Step¶
If you prefer more control:
# 1. Generate linkcheck output
cd docs && make linkcheck > linkcheck.txt && cd ..
# 2. Run the script (it handles everything else automatically)
python3 update_redirects.py
# 3. Review the reports
cat redirect_update_report.md
cat redirect_changes_summary.json
How It Works¶
1. Parse Linkcheck Output¶
The script reads docs/_build/output.txt (generated by Sphinx linkcheck) and extracts:
All broken URLs (to avoid replacing with broken links)
All redirected URLs with their target destinations
2. Safety Validation¶
Critical safety check: Before updating any redirect, the script verifies that the target URL is NOT in the list of broken URLs. This prevents “fixing” a redirect only to point to a broken link.
3. File Updates¶
For each Markdown file with redirects:
Reads the entire file content
Performs global string replacement of old URL → new URL
Tracks the number of occurrences replaced
Writes the updated content back
4. Validation¶
After all updates:
Runs
make linkcheckagainVerifies that none of the newly updated URLs are broken
Reports validation status (PASS/FAIL)
5. Reporting¶
Generates two comprehensive reports:
redirect_changes_summary.json- Machine-readable detailed logredirect_update_report.md- Human-readable summary
6. Cleanup¶
Automatically removes temporary files:
docs/linkcheck.txtdocs/linkcheck_validation.txt
Output Files¶
redirect_changes_summary.json¶
Detailed JSON log containing:
{
"total_redirects_found": 183,
"successful_updates": [
{
"file": "path/to/file.md",
"from_url": "http://old-url.com",
"to_url": "https://new-url.com",
"occurrences": 2
}
],
"skipped_updates": [],
"broken_urls_count": 2
}
redirect_update_report.md¶
Human-readable summary containing:
Total redirects found
Number of successful updates
Number of skipped updates (with reasons)
Validation status (PASS/FAIL)
Complete list of all changes made
Any validation issues found
Configuration¶
The script can be customized by modifying the RedirectUpdater class initialization:
updater = RedirectUpdater(docs_dir="docs") # Change docs directory path
Safety Features¶
1. Broken Link Prevention¶
Never replaces a redirect if the target URL is broken
Logs all skipped updates with reasons
2. Validation Gate¶
Runs a second linkcheck after all changes
Fails if any updated URLs are now broken
Returns non-zero exit code on validation failure
3. File Scope Limitation¶
Only modifies
.md(Markdown) filesIgnores all other file types
Specified in the regex pattern:
([^\s:]+\.md)
4. Audit Trail¶
Records every change made
Tracks occurrences replaced per file
Provides complete before/after URL mapping
Limitations¶
Markdown Only: Only updates
.mdfiles (not.rst,.html, etc.)Exact String Match: Uses simple string replacement (not regex or URL parsing)
Global Replacement: Replaces ALL occurrences of a URL in a file (not selective)
Single Run: Designed for one-time execution (not incremental updates)
Troubleshooting¶
“No redirects found to process”¶
This means linkcheck didn’t find any redirects. Verify:
docs/_build/output.txtexists and contains redirect entriesThe pattern
[redirected ...]appears in the output
“Validation FAILED”¶
The script found that some updated URLs are now broken. Check:
redirect_update_report.mdfor the list of broken URLsThe linkcheck output for details on why they’re broken
Consider reverting changes:
git checkout -- docs/
“File not found” warnings¶
The linkcheck output references files that don’t exist. This can happen if:
Files were moved/renamed since last build
The build is out of sync with source files
Solution: Run
make cleanthenmake linkcheckagain
Integration with CI/CD¶
GitHub Actions Example¶
name: Update Redirects
on:
workflow_dispatch: # Manual trigger
jobs:
update-redirects:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y make python3 python3-venv
make install
- name: Run redirect update script
run: python3 update_redirects.py
- name: Create Pull Request
uses: peter-evans/create-pull-request@v5
with:
commit-message: "docs: Update redirected URLs"
title: "Automated redirect URL updates"
body-path: redirect_update_report.md
branch: automated-redirect-updates
Best Practices¶
Review Changes: Always review the generated reports before committing
Test Locally: Run the script locally first, check
git diffCommit Atomically: Commit redirect updates separately from other changes
Run Periodically: Schedule monthly or quarterly runs to keep links fresh
Check Validation: Never commit if validation fails
Exit Codes¶
0- Success (all redirects updated and validated)1- Failure (validation failed or error occurred)
Examples¶
Successful Run¶
================================================================================
Autonomous Redirect Link Remediation
================================================================================
Parsing docs/linkcheck.txt...
Found 2 broken URLs
Found 183 valid redirects to process
Skipped 0 redirects (target URL is broken)
Updating Markdown files...
contributing/get-help.md: http://old.com -> https://new.com (1 occurrence(s))
...
Successfully updated 165 redirects across 82 files
Running validation linkcheck...
PASSED: All 165 updated URLs are valid!
Process completed: SUCCESS
Failed Run (Skipped Redirects)¶
Found 5 broken URLs
Found 100 valid redirects to process
Skipped 3 redirects (target URL is broken)
- http://example.com/redirect -> http://broken-site.com (broken)
Contributing¶
To modify the script behavior:
Change file scope: Modify the regex pattern in
parse_linkcheck_output()Add more checks: Extend the
verify_updates()methodCustom reporting: Modify the report generation in
main()
Support¶
Review the generated
redirect_update_report.mdfor diagnostic informationCheck Sphinx linkcheck documentation for linkcheck-specific issues