I wasted a weekend building a link checker. Manual clicking would've been faster.
Found 800 URLs in my old blog export. Some were broken. Clicking manually was not happening. Started simple: fetch each URL, check if it returns an error. import requests with open('urls.txt') as f...

Source: DEV Community
Found 800 URLs in my old blog export. Some were broken. Clicking manually was not happening. Started simple: fetch each URL, check if it returns an error. import requests with open('urls.txt') as f: urls = [line.strip() for line in f] for url in urls: r = requests.get(url, timeout=10) if r.status_code >= 400: print(f'broken: {url}') Ran it. Got rate limited after 20 requests. Added a delay. import time for url in urls: r = requests.get(url, timeout=10) if r.status_code >= 400: print(f'broken: {url}') time.sleep(1) Better. Site stopped blocking me. Script ran for a few hours and found 47 broken links. Cool right? Done. Except I went back to manually check a few of the "working" URLs and some were clearly dead. What. Turns out the script was stopping at redirects. If URL A redirected to URL B, it only checked if A existed. URL B could be 404 and nobody would know. So I rewrote it to follow the whole chain. import requests def check_url(url): current = url while True: r = requests.g