Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Broken links #12808

Closed
50 of 60 tasks
spageektti opened this issue May 19, 2024 · 8 comments
Closed
50 of 60 tasks

Broken links #12808

spageektti opened this issue May 19, 2024 · 8 comments
Labels
help wanted You can help make tldr-pages better! page edit Changes to an existing page(s).

Comments

@spageektti
Copy link
Member

spageektti commented May 19, 2024

I created a Python script that looks for broken links.
It may contain some false positives.
The pages should be checked and corrected.

Script
import os
import re
import requests

def find_all_files(root_dir):
    all_files = []
    for subdir, _, files in os.walk(root_dir):
        for file in files:
            all_files.append(os.path.join(subdir, file))
    return all_files

def extract_link(line):
    match = re.search(r'> More information: <(https?://[^>]+)>', line)
    if match:
        return match.group(1)
    return None

def check_link(url):
    try:
        response = requests.head(url, allow_redirects=True, timeout=10)
        if response.status_code == 404:
            return False
        return True
    except requests.RequestException:
        return False

def process_files(root_dir):
    all_files = find_all_files(root_dir)
    for file_path in all_files:
        with open(file_path, 'r', encoding='utf-8') as file:
            for line in file:
                link = extract_link(line)
                if link and not check_link(link):
                    rel_path = os.path.relpath(file_path, root_dir)
                    print(f'Broken link in file: {rel_path}')

if __name__ == '__main__':
    root_directory = 'tldr/pages/'
    process_files(root_directory)
Output
Broken link in file: common/bru.md
Broken link in file: common/cabal.md
Broken link in file: common/clash.md
Broken link in file: common/deemix.md
Broken link in file: common/docker-machine.md
Broken link in file: common/gcloud-info.md
Broken link in file: common/golangci-lint.md
Broken link in file: common/hub-browse.md
Broken link in file: common/idnits.md
Broken link in file: common/jdupes.md
Broken link in file: common/magento.md
Broken link in file: common/mutagen.md
Broken link in file: common/nf-core.md
Broken link in file: common/ouch.md
Broken link in file: common/pnpx.md
Broken link in file: common/qemu-img.md
Broken link in file: common/runsv.md
Broken link in file: common/runsvchdir.md
Broken link in file: common/runsvdir.md
Broken link in file: common/sam2p.md
Broken link in file: common/secrethub.md
Broken link in file: common/slimrb.md
Broken link in file: common/spatial.md
Broken link in file: common/spfquery.md
Broken link in file: common/sv.md
Broken link in file: common/texdoc.md
Broken link in file: common/tree.md
Broken link in file: common/unison.md
Broken link in file: common/virsh.md
Broken link in file: common/wireplumber.md
Broken link in file: common/wpexec.md
Broken link in file: common/xdelta.md
Broken link in file: linux/asterisk.md
Broken link in file: linux/burpsuite.md
Broken link in file: linux/check-language-support.md
Broken link in file: linux/eopkg.md
Broken link in file: linux/feedreader.md
Broken link in file: linux/genid.md
Broken link in file: linux/guix-package.md
Broken link in file: linux/gummy.md
Broken link in file: linux/kdialog.md
Broken link in file: linux/lxterminal.md
Broken link in file: linux/ntpdate.md
Broken link in file: linux/obabel.md
Broken link in file: linux/pro.md
Broken link in file: linux/rpmbuild.md
Broken link in file: linux/swupd.md
Broken link in file: linux/virt-manager.md
Broken link in file: linux/vrms.md
Broken link in file: linux/warpd.md
Broken link in file: osx/airport.md
Broken link in file: osx/bnepd.md
Broken link in file: osx/emond.md
Broken link in file: osx/safeejectgpu.md
Broken link in file: osx/shuf.md
Broken link in file: osx/tail.md
Broken link in file: osx/webinspectord.md
Broken link in file: osx/whence.md
Broken link in file: osx/yaa.md
Broken link in file: windows/reg-flags.md

Pages with broken links:

@spageektti spageektti added page edit Changes to an existing page(s). help wanted You can help make tldr-pages better! labels May 19, 2024
@kbdharun
Copy link
Member

kbdharun commented May 19, 2024

Thanks for the work with a simple script,

We already already have a WIP script by @vitorhcl for this with some QoL improvements.

If possible can you checkout #12289 (for discussion), #12506 (for implementation) and improve the existing discussed script.

@tricantivu
Copy link
Member

Hi, I ticked the boxes for genid and virt-manager. The latter's link works properly for me.

@spageektti
Copy link
Member Author

spageektti commented May 28, 2024

I will open a PR for osx and windows tomorrow. (#12873)

@tricantivu
Copy link
Member

tricantivu commented May 28, 2024

I will open a PR for osx and windows tomorrow.

Did you find anything for spatial? I couldn't, besides the latest Web Archive snapshot and here

@spageektti
Copy link
Member Author

I will open a PR for osx and windows tomorrow.

Did you find anything for spatial? I couldn't, besides the latest Web Archive snapshot and here

I checked impropable.io and it seems that they have been inactive for 2 years on any of their social media. SpatialOS seems to be a dead project. I only found other similar project with the same name.

@tricantivu
Copy link
Member

I checked impropable.io and it seems that they have been inactive for 2 years on any of their social media. SpatialOS seems to be a dead project. I only found other similar project with the same name.

I suggest deletion

@sebastiaanspeck
Copy link
Member

A continuous updated issue/list: tldr-pages/tldr-maintenance#129. This issue gets updated every Monday to check all pages/ for any broken links (http code is 100…103 or 200…299 or 429)

@sebastiaanspeck
Copy link
Member

About magento; it has been taken over by Adobe and now redirects to https://business.adobe.com/products/magento/magento-commerce.html

@sebastiaanspeck sebastiaanspeck closed this as not planned Won't fix, can't repro, duplicate, stale Oct 17, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
help wanted You can help make tldr-pages better! page edit Changes to an existing page(s).
Projects
None yet
Development

No branches or pull requests

4 participants