Pro/Forums

Pro/Forums (http://forums.procooling.com/vbb/index.php)
-   Random Nonsense / Geek Stuff (http://forums.procooling.com/vbb/forumdisplay.php?f=15)
-   -   wget help, or something like that... (http://forums.procooling.com/vbb/showthread.php?t=8298)

iroc409 11-07-2003 01:04 AM

wget help, or something like that...
 
i've used wget here and there, and it's overall an awesome program. but, i'm looking to do something that wget may not be able to do (or i may have not stumbled across it yet...).

it appears to me wget only spiders a website, and finds information that way. what i'm looking to do is pull everything in a website, not necessarily those things that can be spidered.

for example, if a site has an index.html, but has unlinked html and content in the site (we'll say ass.html and ass.jpg), how can i grab everything? what if it's in a directory off root, but not linked?

this would be rather useful, i hope you get what i'm talking about. will wget grab it with a switch i'm not using, or is there something out there that can do it? so far i haven't found anything that will grab stuff unless it's linked in an html file.

KnightElite 11-07-2003 01:44 AM

It can't be done, I don't think. Unless the program has some way of finding all the files on a site, it can't grab them for you. So if they're unlinked, you are screwed.

#Rotor 11-07-2003 10:25 AM

why don't you just grep for anything with *.htm *.html etc...

and build your own index.html for that particular site, making sure all is linked, and then feed this new file to the spider...

airspirit 11-07-2003 12:53 PM

Try Mozilla Firebird and the spider plugins. Those work pretty damn good.

iroc409 11-07-2003 12:55 PM

thanks for the replies, i kinda thought there wasn't a way to do it :(

the grep idea is a good one, i can use that on some of my stuff (except where i don't have shell access, of course). i'll have to give that a whirl, i've got one site that has a ton of crap on it, and i don't even know what half of it is... lol.

as long as stuff is linked in the file, it doesn't matter where it is, right? i can bury the index file somewhere on the server off the site's root, and it still should find everything, or only things that are in directories below it?

the only way i would think is if somehow you could get apache on the target server to return a directory listing like it does on a directory without index.html (but the directory does have an index file). i'm guessing there isn't really a way do do that. not really worried about stuff with htaccess protection on it, but that would be a plus in some cases.

iroc409 11-07-2003 01:50 PM

Quote:

Originally posted by airspirit
Try Mozilla Firebird and the spider plugins. Those work pretty damn good.

firebird kicks ass. i installed a few plugins last night, including the 'popup counter'.

that's actually pretty cool, not sure what i'd do without the blocker. for work i have to go through a lot of sites with tons of popups which totally blows. it's kinda interesting to see which ones have how many :)

tried out the spiderzilla thing, works pretty well, kinda handy since its built into the browser. works just like wget :)

another thing that i found totally spectacularly cool is the "view in IE". ie has pretty much disappeared, i used to be able to right click on the file and "open with" but it doesn't do that anymore. now i have a shortcut on the desktop for ie, and have to go through all that crap.

but with this, any page that is open, or any link in any page you just right click and view in ie. totally priceless object for a webdesigner, whoever thought up that extension is a genius.

heh... i even installed a skin. usually i totally despise skins, but this one is verra nice. with the tiny buttons, it makes the window frame very very small, and more viewing area. less BS. the skin is "breeze"... snag it off the site :)

airspirit 11-07-2003 07:17 PM

And another convert is created ... *sigh*

It is sad that Firebird is superior to IE in every way but nobody ever gives it a try. Everyone I've got to use it once never use IE anymore.

iroc409 11-07-2003 08:08 PM

yeah, i used to use netscape, although i really don't like all the netscape addons. i just want a browser that works well, nothing more.

using firebird from a design perspective is great, because 99% of the time if it works in firebird, it works everywhere else (and is more strict on code).

however, i find IE has a terrible, terrible time rendering web documents. and if a page does get broken in IE, i find it much more difficult to fix IE pages than fixing an IE page for mozilla. ugh.

my other bitch about IE is freaking png's. IE has built-in support for png, but it's very difficult to get it to run (lots of code, yuck!). everything else on the planet supports png's, and they're so much nicer. i wish M$ would pull their head out of their asses.


All times are GMT -5. The time now is 12:36 PM.

Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
(C) 2005 ProCooling.com
If we in some way offend you, insult you or your people, screw your mom, beat up your dad, or poop on your porch... we're sorry... we were probably really drunk...
Oh and dont steal our content bitches! Don't give us a reason to pee in your open car window this summer...