miriam_e | testing crossposts

You're viewing

miriam_e's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

miriam_e

I've hopefully set my Dreamwidth account to crosspost to my LiveJournal account...

Crossposts: http://miriam-e.livejournal.com/265350.html

Threaded | Top-Level Comments Only

From:

greylock

Oh, that's better.

Here's a bunch of e-book stuff you might like, if you don't know already:
http://www.metafilter.com/95773/Arthurs-Classic-Novels-his-Love-of-Mankind-and-the-Internet

From:

miriam_e

Cool! Thank you. I accidentally found Arthurs Classic Novels quite a while ago, but I'd forgotten about it and he seems to have added a lot more since I last looked.

By the way, I've added you to my circle too.

From:

miriam_e

[rubs hands together mwahahahah] :)
I just finished a fairly long download (681 MB) of all the science fiction titles there using this sweet command:
wget -m -k -p -E -l 0 -np http://arthursbookshelf.com/sci-fi/index.html

wget downloads stuff to the current directory, that is, wherever your terminal is when you issue the command (use pwd to find out where that is)

-m (equivalent to -r -N - l inf -nr): -r recursive download (sub-directories, sub-sub-directories, etc); -N don't re-retrieve files unless they're newer (useful if you need to interrupt and restart later); -l inf infinite levels of recursion depth; -nr don't remove '.listing' files (useful for FTP sites)
-k converts all links to point to local files (so the pages work offline)
-p gets all parts of pages (e.g. images) even if outside the recursion tree
-E save all html pages with .html extension (good for php and asp pages)
-l 0 (same as -l inf) seems to be needed sometimes even with -m
-np don't ascend to parent directories

Some sites need -e robots=off which disobeys the robots.txt restriction, preventing robots from entering certain areas in a site. Arthur's Classic Novels doesn't. In any case, be very careful with ignoring the robots.txt restrictions. It can be for your own good as well as theirs because you could end up downloading volumes of useless data from old forums wasting time, bandwidth, and disk space.

Another thing to be careful of is don't use the full power of a high-speed connection for this. If you have some net limiting software then good, otherwise you can use -wait=SECONDS to wait SECONDS between retrievals and --limit-rate=RATE to limit the download rate to RATE.

wget has incredible capabilities. You can add a list of filetypes you want, or a list of filetypes you want to specifically exclude. You can also exclude certain directories. It can do lots, lots more.

From:

miriam_e

From:

greylock.livejournal.com

I can see it here, but not at DW. Might need to add you to my "circle".

Threaded | Top-Level Comments Only

Profile

miriam_e

January 2026

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Page Summary

Style Credit

Style: Old Honey for Crossroads by branchandroot
Resources: 3 minutes

Expand Cut Tags

No cut tags

Page generated Jan. 16th, 2026 08:31 pm

Miriam's Dreamwidth blog

testing crossposts

testing crossposts

no subject

no subject

no subject

no subject

no subject

Profile

January 2026

Page Summary

Style Credit

Expand Cut Tags