trying to create script to extract and check URLs

Topics relating to MX Docs and MX Videos
Message
Author
User avatar
Jerry3904
Administrator
Posts: 21960
Joined: Wed Jul 19, 2006 6:13 am

trying to create script to extract and check URLs

#1 Post by Jerry3904 »

I found a nice script to extract the urls that I adapted:

Code: Select all

pdftk SOMETHING.pdf cat output out.pdf uncompress; strings out.pdf | grep -i http > ~/MX/Documents/SOMETHING-URLs.txt
I had to install pdftk first. That produced this kind of output for S2 (had to create a pdf from S2.odt, more on that later):
/URI (https://en.wikipedia.org/wiki/32-bit)
/URI (https://antixlinux.com/)
/URI (http://en.wikipedia.org/wiki/Physical_Address_Extension)
/URI (https://en.wikipedia.org/wiki/64-bit_computing)
. Arimo offers improved on-screen readability characteristics and the pan-European WGL character set and solves the needs of developers looking for width-compatible fonts to address document portability across platforms.http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
. Cousine offers improved on-screen readability characteristics and the pan-European WGL character set and solves the needs of developers looking for width-compatible fonts to address document portability across platforms.http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
. Tinos offers improved on-screen readability characteristics and the pan-European WGL character set and solves the needs of developers looking for width-compatible fonts to address document portability across platforms.http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
. Tinos offers improved on-screen readability characteristics and the pan-European WGL character set and solves the needs of developers looking for width-compatible fonts to address document portability across platforms.http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
. Tinos offers improved on-screen readability characteristics and the pan-European WGL character set and solves the needs of developers looking for width-compatible fonts to address document portability across platforms.http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
. Cousine offers improved on-screen readability characteristics and the pan-European WGL character set and solves the needs of developers looking for width-compatible fonts to address document portability across platforms.http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
. Arimo offers improved on-screen readability characteristics and the pan-European WGL character set and solves the needs of developers looking for width-compatible fonts to address document portability across platforms.http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
. Tinos offers improved on-screen readability characteristics and the pan-European WGL character set and solves the needs of developers looking for width-compatible fonts to address document portability across platforms.http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
/URI (https://www.techsupportalert.com/conten ... lained.htm)
/URI (https://support.apple.com/en-us/HT201948)
/URI (https://support.microsoft.com/en-us/kb/827218)
/URI (https://mxlinux.org/download-links)
/URI (https://mxlinux.org/create-mx-live-usb- ... apshot-iso)
/URI (http://www.mxlinux.org/download-links)
/URI (http://en.wikipedia.org/wiki/ISO_9660)
/URI (https://mxlinux.org/videos/torrent)
/URI (https://mxlinux.org/download-links)
/URI (http://en.wikipedia.org/wiki/BitTorrent)
/URI (https://mxlinux.org/wiki/system/iso-download-mirrors)
/URI (https://mxlinux.org/download-links)
/URI (https://en.wikipedia.org/wiki/SHA-2)
/URI (http://www.winmd5.com/)
/URI (https://mxlinux.org/wiki/system/dd-command)
/URI (https://rufus.akeo.ie/)
/URI (https://mxlinux.org/wiki/system/signed-iso-files)
/URI (http://forums.debian.net/)
/URI (https://wiki.debian.org/InstallingDebianOn/Apple)
/URI (http://www.nirsoft.net/utils/product_cd_key_viewer.html)
/URI (http://mxlinux.org/wiki/help-files/help-disk-manager)
/URI (https://www.youtube.com/watch?v=khg6_sdrOBQ)
/URI (https://www.youtube.com/watch?v=lf8eXhCKghg)
/URI (http://gparted.org/display-doc.php?name=help-manual)
/URI (https://en.wikipedia.org/wiki/Universal ... identifier)
/URI (http://www.plop.at/)
/URI (https://mxlinux.org/wiki/system/boot-parameters)
/URI (https://mxlinux.org/wiki/system/uefi)
/URI (https://en.wikipedia.org/wiki/Unified_E ... _Interface)
/URI (https://mxlinux.org/uefi-boot-issues-an ... ings-check)
/URI (https://www.mxlinux.org/user_manual_mx1 ... oader.html)
/URI (https://www.mxlinux.org/wiki/system/uefi)
/URI (http://en.wikipedia.org/wiki/Linux_startup_process)
/URI (https://mxlinux.org/wiki/system/boot-parameters)
/URI (http://docs.xfce.org/xfce/getting-started)
/URI (https://mxlinux.org/node/177)
/URI (http://gottcode.org/xfce4-whiskermenu-plugin)
/URI (https://wiki.xfce.org/faq)
/URI (http://www.xfce.org/about)
/URI (https://mxlinux.org/my-home-folder-setu ... sk-manager)
/URI (https://mxlinux.org/mx-linux-17-mx-17-i ... albox-2017)
/URI (https://mxlinux.org/wiki/system/hibernate)
/URI (https://en.wikipedia.org/wiki/S.M.A.R.T.)
/URI (https://mxlinux.org/wiki/help-files/help-disk-manager)
/URI (https://mxlinux.org/wiki/system/boot-parameters)
/URI (https://mxlinux.org/wiki/system/gnome-keyring)
I would like to cut this file to give me the actual URLs in that Section. I turned on line numbers and ran a sed:

Code: Select all

sed -i '1,12d' S2-URLs.txt 
That was not correct, because it gave me a list of URLs that nicely removed the unnecessary material but also the first 4 real URLs before the adverts and then another afterwards. I could deal with the original, I suppose, but I'm hoping someone can suggest a correction b/c I'm not sure the line numbering will be constant from Section to Section.

Upcoming: creating a pdf from an odt; autochecking the urls in the final version of the output file.
Production: 5.10, MX-23 Xfce, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 16 GB, SSD 120 GB, Data 1TB
Personal: Lenovo X1 Carbon with MX-23 Fluxbox and Windows 10
Other: Raspberry Pi 5 with MX-23 Xfce Raspberry Pi Respin

User avatar
Jerry3904
Administrator
Posts: 21960
Joined: Wed Jul 19, 2006 6:13 am

Re: trying to create script to extract and check URLs

#2 Post by Jerry3904 »

If all else fails, I suppose I could see if PDF Link Editor would run under Wine...
Production: 5.10, MX-23 Xfce, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 16 GB, SSD 120 GB, Data 1TB
Personal: Lenovo X1 Carbon with MX-23 Fluxbox and Windows 10
Other: Raspberry Pi 5 with MX-23 Xfce Raspberry Pi Respin

User avatar
dolphin_oracle
Developer
Posts: 20024
Joined: Sun Dec 16, 2007 1:17 pm

Re: trying to create script to extract and check URLs

#3 Post by dolphin_oracle »

try

Code: Select all

grep -o http.* filename |cut -d\) -f1

to strip out just the hyperlinks that start with http*

the cut at the end takes out the trailing ) .

Code: Select all

grep -o http.* 1.txt |cut -d\) -f1
https://en.wikipedia.org/wiki/32-bit
https://antixlinux.com/
http://en.wikipedia.org/wiki/Physical_Address_Extension
https://en.wikipedia.org/wiki/64-bit_computing
http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
http://www.ascendercorp.com/http://www. ... mlLicensed under the SIL Open Font License, Version 1.1http://scripts.sil.org/OFL
https://www.techsupportalert.com/conten ... lained.htm
https://support.apple.com/en-us/HT201948
https://support.microsoft.com/en-us/kb/827218
https://mxlinux.org/download-links
https://mxlinux.org/create-mx-live-usb- ... apshot-iso
http://www.mxlinux.org/download-links
http://en.wikipedia.org/wiki/ISO_9660
https://mxlinux.org/videos/torrent
https://mxlinux.org/download-links
http://en.wikipedia.org/wiki/BitTorrent
https://mxlinux.org/wiki/system/iso-download-mirrors
https://mxlinux.org/download-links
https://en.wikipedia.org/wiki/SHA-2
http://www.winmd5.com/
https://mxlinux.org/wiki/system/dd-command
https://rufus.akeo.ie/
https://mxlinux.org/wiki/system/signed-iso-files
http://forums.debian.net/
https://wiki.debian.org/InstallingDebianOn/Apple
http://www.nirsoft.net/utils/product_cd_key_viewer.html
http://mxlinux.org/wiki/help-files/help-disk-manager
https://www.youtube.com/watch?v=khg6_sdrOBQ
https://www.youtube.com/watch?v=lf8eXhCKghg
http://gparted.org/display-doc.php?name=help-manual
https://en.wikipedia.org/wiki/Universal ... identifier
http://www.plop.at/
https://mxlinux.org/wiki/system/boot-parameters
https://mxlinux.org/wiki/system/uefi
https://en.wikipedia.org/wiki/Unified_E ... _Interface
https://mxlinux.org/uefi-boot-issues-an ... ings-check
https://www.mxlinux.org/user_manual_mx1 ... oader.html
https://www.mxlinux.org/wiki/system/uefi
http://en.wikipedia.org/wiki/Linux_startup_process
https://mxlinux.org/wiki/system/boot-parameters
http://docs.xfce.org/xfce/getting-started
https://mxlinux.org/node/177
http://gottcode.org/xfce4-whiskermenu-plugin
https://wiki.xfce.org/faq
http://www.xfce.org/about
https://mxlinux.org/my-home-folder-setu ... sk-manager
https://mxlinux.org/mx-linux-17-mx-17-i ... albox-2017
https://mxlinux.org/wiki/system/hibernate
https://en.wikipedia.org/wiki/S.M.A.R.T.
https://mxlinux.org/wiki/help-files/help-disk-manager
https://mxlinux.org/wiki/system/boot-parameters
https://mxlinux.org/wiki/system/gnome-keyring

some of the links in your output you provided were shortened with ... though.
http://www.youtube.com/runwiththedolphin
lenovo ThinkPad X1 Extreme Gen 4 - MX-23
FYI: mx "test" repo is not the same thing as debian testing repo.

User avatar
aledie
Posts: 188
Joined: Thu Sep 13, 2018 2:06 pm

Re: trying to create script to extract and check URLs

#4 Post by aledie »

@Jerry: Just out of curiosity googled it, seems pyPDF can also do something similar: https://stackoverflow.com/questions/277 ... -in-python
MX-18 (x64): HP 8460p, i5-2540M, 8GB RAM, 256GB SSD, HD3000

User avatar
Jerry3904
Administrator
Posts: 21960
Joined: Wed Jul 19, 2006 6:13 am

Re: trying to create script to extract and check URLs

#5 Post by Jerry3904 »

Did you look at it? It is intensely complicated, and not the kind of app I would ever want to try to use.

Thanks.
Production: 5.10, MX-23 Xfce, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 16 GB, SSD 120 GB, Data 1TB
Personal: Lenovo X1 Carbon with MX-23 Fluxbox and Windows 10
Other: Raspberry Pi 5 with MX-23 Xfce Raspberry Pi Respin

User avatar
kmathern
Developer
Posts: 2409
Joined: Wed Jul 12, 2006 2:26 pm

Re: trying to create script to extract and check URLs

#6 Post by kmathern »

I think I can extract urls from the MX-18 user manual with this:

Code: Select all

strings /usr/local/share/doc/mxum.pdf | grep http[s]?:// -E | cut -f2 -d\( | sed 's/)>>$//' | sort

Which results in:

Code: Select all

$ strings /usr/local/share/doc/mxum.pdf | grep http[s]?:// -E | cut -f2 -d\( | sed 's/)>>$//' | sort
file:///home/jb/MX/MX16-1_Manual/1_current/mxum_github/%E2%80%A2%20http://www.debian.org/doc/FAQ/ch-pkgtools.en.html
http://antix.mepis.org/
http://appimage.org/
http://audacious-media-player.org/
http://chromium-bsu.sourceforge.net/
http://clayo.org/osmo/
http://clonezilla.org/
http://conky.sourceforge.net/index.html
http://crontab-generator.org/
http://dansguardian.org/
http://deadbeef.sourceforge.net/
http://docs.xfce.org/xfce/getting-started
http://docs.xfce.org/xfce/thunar/custom-actions
http://docs.xfce.org/xfce/thunar/start
http://docs.xfce.org/xfce/xfce4-panel/start
http://docs.xfce.org/xfce/xfce4-settings/accessibility
http://dosbox.sourceforge.net/
http://dosbox.sourceforge.net/wiki
http://dossizola.sourceforge.net/
http://en.cppreference.com/w/c/chrono/strftime
http://en.wikipedia.org/wiki/BitTorrent
http://en.wikipedia.org/wiki/Comparison_of_platform_virtualization_software%20
http://en.wikipedia.org/wiki/Comparison_of_X_Window_System_desktop_environments
http://en.wikipedia.org/wiki/CrossOver
http://en.wikipedia.org/wiki/Debian
http://en.wikipedia.org/wiki/Emulator
http://en.wikipedia.org/wiki/Ext4
http://en.wikipedia.org/wiki/File_system
http://en.wikipedia.org/wiki/File_system
http://en.wikipedia.org/wiki/ISO_9660
http://en.wikipedia.org/wiki/Linux_kernel
http://en.wikipedia.org/wiki/Linux_startup_process
http://en.wikipedia.org/wiki/Media_Transfer_Protocol
http://en.wikipedia.org/wiki/NDISwrapper
http://en.wikipedia.org/wiki/Personal_firewall
http://en.wikipedia.org/wiki/Physical_Address_Extension
http://en.wikipedia.org/wiki/Runlevel
http://en.wikipedia.org/wiki/Secure_Shell
http://en.wikipedia.org/wiki/Software_repository
http://en.wikipedia.org/wiki/Unix
http://en.wikipedia.org/wiki/Virtual_machine%20
http://en.wikipedia.org/wiki/Window_manager
http://forums.debian.net/
http://gnome-schedule.sourceforge.net/
http://gnucash.org/
http://goodies.xfce.org/projects/panel-plugins/start
http://goodies.xfce.org/projects/panel-plugins/xfce4-notes-plugin
http://gottcode.org/
http://gottcode.org/xfce4-whiskermenu-plugin
http://gottcode.org/xfce4-whiskermenu-plugin/
http://gparted.org/display-doc.php?name=help-manual
http://gscan2pdf.sourceforge.net/
http://gscan2pdf.sourceforge.net/
http://gufw.org/
http://kodi.tv/
http://kqlives.sourceforge.net/
http://lgames.sourceforge.net/
http://lincity.sourceforge.net/
http://linuxwireless.org/
http://linux-wless.passys.nl/
http://littlesvr.ca/asunder/
http://luckybackup.sourceforge.net/manual.html
http://lxde.org/
http://mars-game.sourceforge.net/?page_id=972
http://mate-desktop.org/
http://mepis.org/
http://mtpaint.sourceforge.net/
http://mxlinux.org/wiki/help-files/help-disk-manager
http://mxlinux.org/wiki/system/alias
http://mxlinux.org/wiki/system/kde-on-mx-linux
http://pclosmag.com/download.php?f=XfceTipsTricksSE.pdf
http://pdfshuffler.sourceforge.net/
http://pidgin.im/
http://porteus-kiosk.org/index.html
http://pysolfc.sourceforge.net/
http://recordmydesktop.sourceforge.net/about.php
http://ri-li.sourceforge.net/
https://01.org/linuxgraphics/downloads
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/sec-Printer_Configuration.html
https://addons.mozilla.org/en-US/firefox/
https://addons.mozilla.org/en-US/firefox/addon/mkiosk/
https://answers.launchpad.net/grub-customizer/+faq/1355
https://antixlinux.com/
https://backports.debian.org/
https://budgie-desktop.org/home/
https://displaycal.net/
https://dl.google.com/linux/direct/google-talkplugin_current_i386.deb
https://en.wikipedia.org/wiki/32-bit
https://en.wikipedia.org/wiki/64-bit_computing
https://en.wikipedia.org/wiki/Alien_\
https://en.wikipedia.org/wiki/APT_\
https://en.wikipedia.org/wiki/APT_\
https://en.wikipedia.org/wiki/Beneath_a_Steel_Sky
https://en.wikipedia.org/wiki/Comparison_of_file_synchronization_software
https://en.wikipedia.org/wiki/Compiz#External_links
https://en.wikipedia.org/wiki/Compositing_window_manager
https://en.wikipedia.org/wiki/Dd_\
https://en.wikipedia.org/wiki/Domain_Name_System
https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support
https://en.wikipedia.org/wiki/File_synchronization
https://en.wikipedia.org/wiki/Flatpak
https://en.wikipedia.org/wiki/Gamma_correction
https://en.wikipedia.org/wiki/Init
https://en.wikipedia.org/wiki/Magic_SysRq_key
https://en.wikipedia.org/wiki/NDISwrapper
https://en.wikipedia.org/wiki/Neko_\
https://en.wikipedia.org/wiki/Nouveau_\
https://en.wikipedia.org/wiki/Scripting_language
https://en.wikipedia.org/wiki/Shebang_\
https://en.wikipedia.org/wiki/S.M.A.R.T.
https://en.wikipedia.org/wiki/Solid-state_drive
https://en.wikipedia.org/wiki/Systemd
https://en.wikipedia.org/wiki/Traceroute
https://en.wikipedia.org/wiki/Trim_\
https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface
https://en.wikipedia.org/wiki/Universally_unique_identifier
https://flathub.org/
https://flathub.org/#setup
https://forum.mxlinux.org/viewforum.php?f=101
https://forum.mxlinux.org/viewforum.php?f=96
https://forum.mxlinux.org/viewforum.php?f=96
https://forum.mxlinux.org/viewforum.php?f=97
https://forum.xfce.org/
https://github.com/blueman-project/blueman/wiki/Troubleshooting
https://github.com/MX-Linux
https://gitlab.gnome.org/GNOME/simple-scan
https://gottcode.org/kapow/
https://handbrake.fr/
https://help.ubuntu.com/community/Locale
https://help.ubuntu.com/community/NetworkManager
https://hexchat.github.io/
https://inkscape.org/en/
https://jitsi.org/
https://jitsi.org/
https://justgetflux.com/linux.html
https://launchpad.net/deja-dup
https://launchpad.net/gdebi%20
https://launchpad.net/qpdfview
https://libregamewiki.org/Gnome_Hearts
http://smplayer.sourceforge.net/
http://smxi.org/
http://smxi.org/
http://smxi.org/docs/
http://smxi.org/docs/sgfxi-manual.htm
https://mxlinux.org/about-us
https://mxlinux.org/changing-your-kernel-antix-live-usb-mx
https://mxlinux.org/community-repo
https://mxlinux.org/community-repo
https://mxlinux.org/create-mx-live-usb-windows-using-mx-monthly-snapshot-iso
https://mxlinux.org/customize-xfce-412-dropdown-terminal-xfce4-terminal
https://mxlinux.org/doc_mx/listpackages.txt
https://mxlinux.org/download-links
https://mxlinux.org/download-links
https://mxlinux.org/download-links
https://mxlinux.org/install-apps-mx-package-installer
https://mxlinux.org/make-snapshot-installed-system
https://mxlinux.org/migration
https://mxlinux.org/mx-15-liveusb-persistence-legacy-boot-mode
https://mxlinux.org/mx-15-liveusb-persistence-uefi-boot-mode
https://mxlinux.org/mx-15-new-utilities
https://mxlinux.org/mx-16-kwin
https://mxlinux.org/mx-16-live-usb-persistence
https://mxlinux.org/mx-16-remaster-your-live-usb
https://mxlinux.org/mx-17-live-usb-installing-apps
https://mxlinux.org/mx-17-whats-new
https://mxlinux.org/mx-linux-17-make-live-usb-persistence
https://mxlinux.org/mx-linux-17-mx-17-installation-overview-oracle-virtualbox-2017
https://mxlinux.org/mx-menu-editor
https://mxlinux.org/mx-spins-stevos-kde
https://mxlinux.org/mx-spins-workbench
https://mxlinux.org/my-home-folder-setup-and-disk-manager
https://mxlinux.org/netflix-32-bit-linux
https://mxlinux.org/node/140
https://mxlinux.org/node/141
https://mxlinux.org/node/177
https://mxlinux.org/smartphones-mx-16-samsung-galaxy-s5-and-iphone-6s
https://mxlinux.org/support
https://mxlinux.org/things-do-after-installing-mx
https://mxlinux.org/uefi-boot-issues-and-some-settings-check
https://mxlinux.org/update-netflix-32-bit-linux
https://mxlinux.org/videos/customizing-desktop
https://mxlinux.org/videos/customizing-whisker-menu
https://mxlinux.org/videos/mx-apps
https://mxlinux.org/videos/samba-config-manual-method
https://mxlinux.org/videos/thumbnail-images-thunar
https://mxlinux.org/videos/torrent
https://mxlinux.org/videos/virtualbox
https://mxlinux.org/videos/whisker-menu-part-1
https://mxlinux.org/wiki/antix-faqs-applications/guide-smxisgfxiinxi
https://mxlinux.org/wiki/applications-networking/ndiswrapper
https://mxlinux.org/wiki/applications/passwords-and-keys
https://mxlinux.org/wiki/applications/pdf
https://mxlinux.org/wiki/applications-system/aptitude
https://mxlinux.org/wiki/applications-system/compiling-software
https://mxlinux.org/wiki/applications-system/synaptic-errors
https://mxlinux.org/wiki/applications-system/synaptic-errors
https://mxlinux.org/wiki/applications/thunar-custom-actions
https://mxlinux.org/wiki/applications/thunar-custom-actions
https://mxlinux.org/wiki/applications/thunar-custom-actions
https://mxlinux.org/wiki/hardware-networking/broadcom-wireless
https://mxlinux.org/wiki/hardware/printer-drivers
https://mxlinux.org/wiki/hardware/printer-drivers
https://mxlinux.org/wiki/hardware/scanner
https://mxlinux.org/wiki/hardware/unsupported-nvidia-gpus
https://mxlinux.org/wiki/hardware/unsupported-nvidia-gpus
https://mxlinux.org/wiki/help-files/help-amdati-and-nvidia-installers
https://mxlinux.org/wiki/help-files/help-disk-manager
https://mxlinux.org/wiki/help-files/help-live-remasterpersistence-remastercc
https://mxlinux.org/wiki/help-files/help-live-usb-kernel-updater
https://mxlinux.org/wiki/help-files/help-live-usb-maker
https://mxlinux.org/wiki/help-files/help-live-usb-maker
https://mxlinux.org/wiki/help-files/help-mx-apt-notifier
https://mxlinux.org/wiki/help-files/help-mx-boot-repair
https://mxlinux.org/wiki/help-files/help-mx-broadcom-manager
https://mxlinux.org/wiki/help-files/help-mx-check-apt-gpg
https://mxlinux.org/wiki/help-files/help-mx-codecs-installer
https://mxlinux.org/wiki/help-files/help-mx-conky
https://mxlinux.org/wiki/help-files/help-mx-idevice-mounter
https://mxlinux.org/wiki/help-files/help-mx-menu-editor
https://mxlinux.org/wiki/help-files/help-mx-package-installer
https://mxlinux.org/wiki/help-files/help-mx-repo-manager
https://mxlinux.org/wiki/help-files/help-mx-save-system-iso-snapshot
https://mxlinux.org/wiki/help-files/help-mx-system-sounds
https://mxlinux.org/wiki/help-files/help-mx-tweak
https://mxlinux.org/wiki/help-files/help-mx-user-manager
https://mxlinux.org/wiki/help-files/help-sound-card
https://mxlinux.org/wiki/networking/nfs
https://mxlinux.org/wiki/networking/vpn
https://mxlinux.org/wiki/networking/vpn
https://mxlinux.org/wiki/other/application-migration-aptik
https://mxlinux.org/wiki/other/chinese-simplified-input
https://mxlinux.org/wiki/other/high-resolution-displays
https://mxlinux.org/wiki/other/hotcorner
https://mxlinux.org/wiki/sound-not-working
https://mxlinux.org/wiki/sound-not-working
https://mxlinux.org/wiki/sound-not-working
https://mxlinux.org/wiki/system/add-ppa-repository
https://mxlinux.org/wiki/system/boot-parameters
https://mxlinux.org/wiki/system/boot-parameters
https://mxlinux.org/wiki/system/boot-parameters
https://mxlinux.org/wiki/system/boot-parameters
https://mxlinux.org/wiki/system/boot-parameters
https://mxlinux.org/wiki/system/compiling
https://mxlinux.org/wiki/system/dosemu
https://mxlinux.org/wiki/system/format-ext4-filesystem-be-owned-regular-user
https://mxlinux.org/wiki/system/frugal-installation
https://mxlinux.org/wiki/system/hibernate
https://mxlinux.org/wiki/system/installing-software
https://mxlinux.org/wiki/system/permissions
https://mxlinux.org/wiki/system/permissions
https://mxlinux.org/wiki/system/repos-mx-linux
https://mxlinux.org/wiki/system/self-contained-packages
https://mxlinux.org/wiki/system/signed-iso-files
https://mxlinux.org/wiki/system/systemd
https://mxlinux.org/wiki/system/time-settings
https://mxlinux.org/wiki/system/uefi
https://mxlinux.org/wiki/system/wine
https://mxlinux.org/wiki/xfce/xfce-commands-and-other-useful-stuff
https://mxlinux.org/wiki/xfce/xfce-commands-and-other-useful-stuff
https://nomacs.org/
http://sourceforge.net/projects/gould
http://sourceforge.net/projects/mirageiv.berlios/
https://rufus.akeo.ie/
https://snapcraft.io/
https://sourceforge.net/projects/clamav/
https://support.apple.com/en-us/HT201948
https://support.microsoft.com/en-us/kb/827218
https://support.opendns.com/forums/21618374
http://staffwww.itn.liu.se/~stegu/xteddy/index.html
http://supertuxkart.sourceforge.net/
http://supertux.lethargik.org/
http://support.amd.com/
https://wiki.archlinux.org/index.php/Backup_programs
https://wiki.archlinux.org/index.php/Blueman
https://wiki.archlinux.org/index.php/disk_cloning
https://wiki.archlinux.org/index.php/Lm_sensors
https://wiki.archlinux.org/index.php/PulseAudio
https://wiki.archlinux.org/index.php/Webcam
https://wiki.archlinux.org/index.php/Wireless
https://wiki.debian.org/accessibility
https://wiki.debian.org/AtiHowTo
https://wiki.debian.org/BluetoothUser#Pairing
https://wiki.debian.org/InstallingDebianOn/Apple
https://wiki.debian.org/mtp
https://wiki.gnome.org/Apps/EasyTAG
https://wiki.gnome.org/Apps/Mines
https://wiki.xfce.org/faq
https://wiki.xfce.org/howto/kiosk_mode
https://wine-staging.com/
https://www.clementine-player.org/
https://www.debian.org/doc/manuals/apt-guide/ch2.en.html
https://www.debian.org/doc/manuals/apt-guide/ch2.en.html
https://www.freedesktop.org/wiki/Software/LightDM/
https://www.freefilesync.org/
https://www.gnu.org/philosophy/categories.en.html
https://www.google.com/earth/download/ge/agree.html
https://www.lesbonscomptes.com/recoll/
https://www.linux.com/learn/tutorials/456149:manage-passwords-encryption-keys-and-more-with-seahorse
https://www.mozilla.org/
https://www.mozilla.org/en-US/projects/calendar/
https://www.mozilla.org/thunderbird/
https://www.mxlinux.org/bugs-features
https://www.mxlinux.org/user_manual_mx15/MX_bootloader.html
https://www.mxlinux.org/videos/samba-config-tool
https://www.mxlinux.org/wiki/help-files/help-mx-usb-unmounter
https://www.mxlinux.org/wiki/system/font-adjustment
https://www.mxlinux.org/wiki/system/uefi
https://www.netflix.com/us/
https://www.plex.tv/
https://www.samba.org/samba/docs/man/using_samba/toc.html
https://www.skype.com/en/
https://www.teamviewer.us/
https://www.techsupportalert.com/content/32-bit-and-64-bit-explained.htm
https://www.virtualbox.org/manual/ch03.html#intro-64bitguests
https://www.virtualbox.org/wiki/VirtualBox
https://www.youtube.com/watch?v=khg6_sdrOBQ
https://www.youtube.com/watch?v=lf8eXhCKghg
http://tldp.org/LDP/Bash-Beginners-Guide/html/index.html
http://tuxracer.sourceforge.net/
http://unetbootin.github.io/
http://web.airdroid.com/
http://wiki.debian.org/Modem/3G
http://wiki.debian.org/WiFi
http://wiki.linuxquestions.org/wiki/Crontab
http://www.adobe.com/products/reader.html
http://www.alandmoore.com/blog/2011/11/05/creating-a-kiosk-with-linux-and-x11-2011-edition/
http://www.algodoo.com/
http://www.alsa-project.org/main/index.php/Matrix:Main
http://www.chkrootkit.org/
http://www.codeweavers.com/compatibility/browse/rank
http://www.codeweavers.com/products/cxoffice
http://www.cyberciti.biz/faq/unix-linux-whereis-command-examples-to-locate-binary/
http://www.debian.org/
http://www.debian.org/doc/FAQ/ch-pkgtools.en.html%20
http://www.dosemu.org/
http://www.dvdstyler.org/en/
http://www.forensicswiki.org/wiki/Ddrescue
http://www.freeciv.org/
http://www.freedesktop.org/wiki/Software/PulseAudio/
http://www.freedesktop.org/wiki/Software/PulseAudio/Documentation/User/
http://www.freedesktop.org/wiki/Software/PulseAudio/FAQ/
http://www.freeos.com/guides/lsst
http://www.frozen-bubble.org/
http://www.gimp.org/
http://www.gnu.org/software/chess
http://www.gnu.org/software/libc
http://www.grisbi.org/
http://www.ibm.com/developerworks/library/l-linux-kernel
http://www.icewm.org/
http://www.kde.org/
http://www.keepassx.org/
http://www.kernel.org/
http://www.kornelix.net/fotoxx/fotoxx.html
http://www.libreoffice.org/discover/draw/
http://www.linfo.org/runlevel_def.html
http://www.linux.com/learn/tutorials/309527-understanding-linux-file-permissions
http://www.linux.com/learn/tutorials/309527-understanding-linux-file-permissions
http://www.linuxcommand.org/
http://www.linuxdevcenter.com/linux/cmd/
http://www.linuxjournal.com/content/bluetooth-hacks
http://www.linux.org/
http://www.maartenbaert.be/simplescreenrecorder/
http://www.makelinux.net/kernel_map
http://www.mepis.org/docs/en/index.php?title=Dosemu
http://www.mesa3d.org/intro.html
http://www.mxlinux.org/download-links
http://www.ncftp.com/libncftp/doc/ftp_overview.html
http://www.newbreedsoftware.com/defendguin
http://www.nirsoft.net/utils/product_cd_key_viewer.html
http://www.nvidia.com/Download/index.aspx
http://www.opengl.org/
http://www.openshot.org/
http://www.openssh.com/manual.html
http://www.pirules.org/addons/gcontactsync/
http://www.plop.at/
http://www.plop.at/
http://www.rastersoft.com/programas/devede.html
http://www.sane-project.org/sane-mfgs.html
http://www.techspot.com/guides/287-default-router-ip-addresses/
http://www.tecmint.com/35-practical-examples-of-linux-find-command/
http://www.thegeekstuff.com/2012/03/locate-command-examples/
http://www.thegeekstuff.com/2013/04/linux-which-whatis-whereis/
http://www.tldp.org/LDP/Bash-Beginners-Guide/html/Bash-Beginners-Guide.html
http://www.tuxpaint.org/
http://www.twotoasts.de/index.php/catfish/
http://www.videolan.org/vlc/
http://www.virtualbox.org/
http://www.virtualbox.org/wiki/Downloads%20
http://www.wesnoth.org/
http://www.winehq.org/
http://www.winmd5.com/
http://www.xfce.org/
http://www.xfce.org/about
http://xpenguins.seul.org/

{the first one is suspicious but the others look okay}



I'll leave it to someone else on how to check the urls (maybe with wget --spider <url>).

User avatar
Moltke
Posts: 229
Joined: Tue Dec 19, 2017 6:07 pm

Re: trying to create script to extract and check URLs

#7 Post by Moltke »

I don't know about a script but what about this method https://www.craig-edmonds.com/extract-u ... -for-free/ I use it and it works like a charm; I've extracted lots of URLs from PDF files with no hassle at all. Just follow the instructions and that's it.

Hope this helps! :happy:
Without each other's help there ain't no hope for us :happy:

User avatar
Jerry3904
Administrator
Posts: 21960
Joined: Wed Jul 19, 2006 6:13 am

Re: trying to create script to extract and check URLs

#8 Post by Jerry3904 »

Thanks. I tried that a couple of days ago and just got error messages about an ad blocker being on that I couldn't solve, so that's when I started this route.
Production: 5.10, MX-23 Xfce, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 16 GB, SSD 120 GB, Data 1TB
Personal: Lenovo X1 Carbon with MX-23 Fluxbox and Windows 10
Other: Raspberry Pi 5 with MX-23 Xfce Raspberry Pi Respin

User avatar
Moltke
Posts: 229
Joined: Tue Dec 19, 2017 6:07 pm

Re: trying to create script to extract and check URLs

#9 Post by Moltke »

Jerry3904 wrote: Sun Feb 24, 2019 4:24 pm Thanks. I tried that a couple of days ago and just got error messages about an ad blocker being on that I couldn't solve, so that's when I started this route.
Weird, I've never faced that, I use Ublockorigin. Oh well, sorry to hear -read- that. Hope you find a solution.
Without each other's help there ain't no hope for us :happy:

User avatar
Moltke
Posts: 229
Joined: Tue Dec 19, 2017 6:07 pm

Re: trying to create script to extract and check URLs

#10 Post by Moltke »

After re-reading your first post I remebered sometime ago I was looking an easier way to rank mirrors in arch, by reading the archwiki I found this article regarding that topic https://wiki.archlinux.org/index.php/Mi ... irror_list which uses awk to extract all servers say by country name
If the servers in the file are grouped by country, one can extract all the servers of a specific country by using:
$ awk '/^## Country Name$/{f=1}f==0{next}/^$/{exit}{print substr($0, 2)}' /etc/pacman.d/mirrorlist.backup
I think you might adapt that into yours and see how that goes.
Without each other's help there ain't no hope for us :happy:

Post Reply

Return to “Documentation and videos”