Site Design for Better Search Engine
Positioning - Part III Log Analysis Data & Uses for Better Site
Development and Positioning
In our last article on search engine positioning, we mentioned
log analysis to determine when the search engine spiders visited
your site. We feel that our readers can benefit from a little more
in depth assistance on how to accomplish that suggestion. In this
article, we will look into log analysis, the how's, the where's
associated with log analysis, and the tools to make your log analysis
data tasks easier. We will explore log analysis data, and how to
use log analysis data to leverage affiliate strategy.
What is Log Analysis Data?
Log analysis data is raw information supplied to you by your host,
or other third party tracking sites, that outlines specific information
about your visitors, their traffic patterns, and pages viewed throughout
your site. It can include information on screen resolution, browser,
operating system, and a whole host of other information relating
to a visitor's computer settings. In raw form, it looks like the
following:
66.150.40.221 - - [11/Jan/2004:18:44:54 -0800] "HEAD /html/tutorials/webmaster/index.htm
HTTP/1.1" 200 0 "-" "InternetSeer.com"
64.68.82.208 - - [11/Jan/2004:19:34:08 -0800] "GET /robots.txt
HTTP/1.0" 404 42586 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.208 - - [11/Jan/2004:19:34:09 -0800] "GET /pics/nav/texttell.swf
HTTP/1.0" 200 2109 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
This is only a couple of lines taken from one of our server logs.
If you remember in the last article, we said paid hosted sites usually
have access to their visitor logs. Yahoo provided the above information
to us. We download our log files on a daily basis, and then use
the log analysis data to make decisions based on our site operation.
We also use the log analysis data to see when spiders and bots visit
our site. You will take notice that googlebot is mentioned twice,
and internetseer.com is mentioned once also. All of the above log
analysis data are from spiders and bots.
Whenever anything makes a querry of your site, it will show up
in the log analysis data. There are many important uses for this
log analysis data. By following URL and IP paths back to the pages,
it references, you can find out if someone is stealing your bandwidth.
You can use the log analysis data to see what sites are linking
to you. You can then visit those sites and get a feel for what kind
of visitor visits your site. There are too many uses to list for
log analysis data, and are beyond the scope of this article. Bottom
line; log analysis data is a vital tool for a developing webmaster.
But What Does all that Log Analysis Data Mean?
Remember that each time your site is visited, one of those lines
will be generated in the log analysis report. Anytime a person points
a browser to a URL within your site; it generates another line of
code in that log analysis data. Anytime you run an HTML validater
on one of your pages, it generates another line of code in that
log analysis data. Yeop, you guessed it, anytime a bot drops by
to visit your site, or anything else related to those actions, will
generate another line of code to that log analysis data. You starting
to see how important that log analysis data can be? It is also important
to keep in mind that this logging includes files your own pages
call from your server.
64.68.82.208 - - [11/Jan/2004:19:34:08 -0800] "GET
/robots.txt HTTP/1.0" 404 42586 "-" "Googlebot/2.1
(+http://www.googlebot.com/bot.html)"
Lets start with the beginning of the line of log analysis data
code (underlined above for illustration purposes). You know, the
numbers separated by dots. That is the ISP of the visitor. By doing
a lookup, you can determine who owns that ISP. If they are stealing
your bandwidth, then further investigation could reveal whom to
contact to file a complaint. It could also give you the email address
of who to contact to request they stop stealing your bandwidth.
The log analysis data will also show you where most of your visitors
come from. It will reveal even more clues to what their habits and
interests are. You could then use that information to adjust your
affiliate programs to fall in line with those habits and interests.
64.68.82.208 - - [11/Jan/2004:19:34:08 -0800] "GET
/robots.txt HTTP/1.0" 404 42586 "-" "Googlebot/2.1
(+http://www.googlebot.com/bot.html)"
The next part of the log analysis data is the date and time (underlined
above for illustration purposes). You will note that it contains
an offset, which we believe is a GMT time differential. If you want
to micromanage, then you could use this information to determine
what time might be the best time to place a particular ad on your
site. You could use that log analysis data to determine when the
best time is to take your site down for maintenance or upgrading
content.
64.68.82.208 - - [11/Jan/2004:19:34:08 -0800] "GET /robots.txt
HTTP/1.0" 404 42586 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
The next parts of the log analysis data report the request method,
the URL of the requested file, and the protocol specification (underlined
above for illustration purposes). You can use this data to identify
missing files on your server, and the paths requested in your HTML,
or the pages that use broken links to link to you. You can then
modify your HTML to reflect the path change, or totally remove the
HTML that requests the file.
Back
to Table of Contents
What
to do With Log Data Analysis  (Article Continues)
Other Search Engine Positioning Articles:
Related E-Book Downloads
By James R. Sanders
January 24, 2004
If you have any suggestions to add to this article,
or have ideas for articles you would like to read, please feel free to contact
us and let us know. You can also feel free to contact us about questions you
might have when it comes to practical webmaster or website design issues in
today's online market. We welcome constructive criticism and want to know what
our visitors think of our site and services. You can either contact
us here, or click the contact link at the bottom of the page. Thanks in
advance, and we look forward to hearing from you.
About the Author
James R. Sanders is the owner of Sanders
Consultation Group Plus. He has been a webmaster and website designer since
1997. He has also been involved in self employment ventures since 1992. He is
presently a contributing author of NewbieHangout,
and has been published through WebProNews
and 4Rankings.com.
His writing is targeted to webmasters, would be webmasters, website designers,
would be website designers, self employed, or those researching information
looking for solutions to questions associated with design, business operations,
and promotion today. His goal is to provide practical information based upon
his years of experience to help webmasters, website designers, and self employed
people achieve their goals in today's competitive global market. You can subscribe
to his free newsletters at SCGP
- Newsletter and become a member of the SCGP Portal. If you like SCGP content
and would like to use it on your site, then check out our content
agreement and terms of use. Use our articles on your site without the hassles
of writing your own content. Get back to the things you could use your time
better for, like site promotion.
Back To Webmaster Resource Area Main Comment About This Article
Comments limited to 1000 characters

This page last updated:
Wednesday, October 19, 2005 11:25 AM
EST
|