Xepher.Net Forums

Community => Knowhow Trading Post => Topic started by: Miluette on April 16, 2009, 06:59:23 AM

Title: Mysterious 404s
Post by: Miluette on April 16, 2009, 06:59:23 AM
Recently I changed the entire archives for both my webcomics. There should be absolutely no way to access the old files anymore, as they no longer exist. Yet, according to AWStats (which is very good at helping me weed out broken links), lots of people are still somehow clicking to old pages of mine.

I did realize part of it was due to forum posts on my own forum, which I changed, but it still happens with a variety of older pages that I'm sure I didn't link anywhere anytime recently (or ever). I thought maybe people have bookmarked them too, but then that still shouldn't happen so much because I have much doubt people bookmarked random old pages of mine!

Some other 404s seem strange to me too, lol. (And the AWStats Wii icon is broken apparently. |D) Anyone else have strange 404s?
Title: Re: Mysterious 404s
Post by: tapewolf on April 16, 2009, 01:39:22 PM
I have had a few people accessing the HTML documents on mine long after I switched to PHP.  These are all from unidentified external sources so I'm not really sure what to do about that, aside from setting up loads and loads of redirects.  They do finally seem to be tailing off, though, so I'm not as bothered.

The most weird 404s I have been getting were these three:

/images/trans.gif
/java/prototype.js
/java/scriptaculous.js

...I really don't know what they are.  Nothing to do with me at all as far as I can tell.  Eventually, because these things were being continually hammered hundreds of times a month, I created a 1-pixel GIF file and some empty .js files to prevent it drowning out the noise.
Title: Re: Mysterious 404s
Post by: Databits on April 17, 2009, 01:10:04 PM
Keep in mind that things like Google (and other search engines) tend to catalog every single bit of your site that it's allowed to (restricted by /robots.txt). So the 404 activity access may very well be search engines attempting to update their records for the things you've removed. That or you managed to kill off some hot linked images. :P

For those who don't know about robots.txt, this is a good start:
http://www.robotstxt.org/ (http://www.robotstxt.org/)

It's simply a text file, there's not really a whole lot to it. But it's useful if you don't want some things being indexed by search engines.
Title: Re: Mysterious 404s
Post by: Miluette on April 18, 2009, 05:19:14 AM
I was gonna look into the robots.txt thing eventually.

The things still being accessed are still in areas I wouldn't mind being indexed. On that note, I think I know what's causing it, or part of. *runs to Google webmasters~*