If you are like me, you want to do stuff because you can and why not. One of these things is recursive website downloading.
Before I explain, let me explain web crawlers, like Google’s Googlebot. Their job is to search the internet ALL DAY, EVERY DAY! They will go to a website, and check for links, like this.
Using the way that lots of Internet sites link to each other, this lets you index lots (if not all) of a website. You can do this yourself to secure yourself an offline copy of a static website (like this one.)
Here’s what you need
wget command (link for Windows to download is here
Now, it’s finally time to download. Run this command
wget --recursive -l inf https://alphagame.dev/
Let’s explain what that command did. The first option,
--recursive, told wget to download a page and keep going deeper by
finding links. The
-l inf tag told it to download infinite layers, like if you have to go deep to find a page. Or, you
-l 2 to find only the main links from the page you chose. If you ran this command, you will see a new folder,
(or whatever site you downloaded). It will have lots of files, that you can open in your web browser, or host locally and look. I
will point out that this mostly only works on static web pages as others may use the internet AFTER the page is loaded which defeats the
While this is a cool trick, be careful how you use this, as it can lead to you spamming the server and you being blacklisted. An example is in Wikipedia’s robots.txt file. You can see that they blocked wget because it was too hard for their servers to respond to all requests. Also, be sure to obey the robots.txt requests when writing software. It is a way for site owners to choose that sites they don’t want in search results, so please respect their wishes.
Site last updated: 2024-02-15 04:08:53 +0000