|
|
Random Nonsense / Geek Stuff All those random tech ramblings you can't fit anywhere else! |
Thread Tools |
11-07-2003, 02:04 AM | #1 |
Cooling Savant
Join Date: Oct 2002
Location: midwest side, yo
Posts: 596
|
wget help, or something like that...
i've used wget here and there, and it's overall an awesome program. but, i'm looking to do something that wget may not be able to do (or i may have not stumbled across it yet...).
it appears to me wget only spiders a website, and finds information that way. what i'm looking to do is pull everything in a website, not necessarily those things that can be spidered. for example, if a site has an index.html, but has unlinked html and content in the site (we'll say ass.html and ass.jpg), how can i grab everything? what if it's in a directory off root, but not linked? this would be rather useful, i hope you get what i'm talking about. will wget grab it with a switch i'm not using, or is there something out there that can do it? so far i haven't found anything that will grab stuff unless it's linked in an html file.
__________________
:shrug: |
11-07-2003, 02:44 AM | #2 |
Cooling Savant
Join Date: Sep 2002
Location: Saskatoon, Saskatchewan
Posts: 294
|
It can't be done, I don't think. Unless the program has some way of finding all the files on a site, it can't grab them for you. So if they're unlinked, you are screwed.
__________________
Can anyone else here say that they have a watercooled monster that's 45" tall? |
11-07-2003, 11:25 AM | #3 |
Cooling Savant
Join Date: Feb 2002
Location: Dione, sector 4s1256
Posts: 852
|
why don't you just grep for anything with *.htm *.html etc...
and build your own index.html for that particular site, making sure all is linked, and then feed this new file to the spider...
__________________
There is no Spoon.... |
11-07-2003, 01:53 PM | #4 |
Been /.'d... have you?
Join Date: Jul 2002
Location: Moscow, ID
Posts: 1,986
|
Try Mozilla Firebird and the spider plugins. Those work pretty damn good.
__________________
#!/bin/sh {who;} {last;} {pause;} {grep;} {touch;} {unzip;} mount /dev/girl -t {wet;} {fsck;} {fsck;} {fsck;} {fsck;} echo yes yes yes {yes;} umount {/dev/girl;zip;} rm -rf {wet.spot;} {sleep;} finger: permission denied |
11-07-2003, 01:55 PM | #5 |
Cooling Savant
Join Date: Oct 2002
Location: midwest side, yo
Posts: 596
|
thanks for the replies, i kinda thought there wasn't a way to do it
the grep idea is a good one, i can use that on some of my stuff (except where i don't have shell access, of course). i'll have to give that a whirl, i've got one site that has a ton of crap on it, and i don't even know what half of it is... lol. as long as stuff is linked in the file, it doesn't matter where it is, right? i can bury the index file somewhere on the server off the site's root, and it still should find everything, or only things that are in directories below it? the only way i would think is if somehow you could get apache on the target server to return a directory listing like it does on a directory without index.html (but the directory does have an index file). i'm guessing there isn't really a way do do that. not really worried about stuff with htaccess protection on it, but that would be a plus in some cases.
__________________
:shrug: |
11-07-2003, 02:50 PM | #6 | |
Cooling Savant
Join Date: Oct 2002
Location: midwest side, yo
Posts: 596
|
Quote:
firebird kicks ass. i installed a few plugins last night, including the 'popup counter'. that's actually pretty cool, not sure what i'd do without the blocker. for work i have to go through a lot of sites with tons of popups which totally blows. it's kinda interesting to see which ones have how many tried out the spiderzilla thing, works pretty well, kinda handy since its built into the browser. works just like wget another thing that i found totally spectacularly cool is the "view in IE". ie has pretty much disappeared, i used to be able to right click on the file and "open with" but it doesn't do that anymore. now i have a shortcut on the desktop for ie, and have to go through all that crap. but with this, any page that is open, or any link in any page you just right click and view in ie. totally priceless object for a webdesigner, whoever thought up that extension is a genius. heh... i even installed a skin. usually i totally despise skins, but this one is verra nice. with the tiny buttons, it makes the window frame very very small, and more viewing area. less BS. the skin is "breeze"... snag it off the site
__________________
:shrug: |
|
11-07-2003, 08:17 PM | #7 |
Been /.'d... have you?
Join Date: Jul 2002
Location: Moscow, ID
Posts: 1,986
|
And another convert is created ... *sigh*
It is sad that Firebird is superior to IE in every way but nobody ever gives it a try. Everyone I've got to use it once never use IE anymore.
__________________
#!/bin/sh {who;} {last;} {pause;} {grep;} {touch;} {unzip;} mount /dev/girl -t {wet;} {fsck;} {fsck;} {fsck;} {fsck;} echo yes yes yes {yes;} umount {/dev/girl;zip;} rm -rf {wet.spot;} {sleep;} finger: permission denied |
11-07-2003, 09:08 PM | #8 |
Cooling Savant
Join Date: Oct 2002
Location: midwest side, yo
Posts: 596
|
yeah, i used to use netscape, although i really don't like all the netscape addons. i just want a browser that works well, nothing more.
using firebird from a design perspective is great, because 99% of the time if it works in firebird, it works everywhere else (and is more strict on code). however, i find IE has a terrible, terrible time rendering web documents. and if a page does get broken in IE, i find it much more difficult to fix IE pages than fixing an IE page for mozilla. ugh. my other bitch about IE is freaking png's. IE has built-in support for png, but it's very difficult to get it to run (lots of code, yuck!). everything else on the planet supports png's, and they're so much nicer. i wish M$ would pull their head out of their asses.
__________________
:shrug: |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
|
|