WebDeveloper.com ®: Where Web Developers and Designers Learn How to Build Web Sites, Program in Java and JavaScript, and More!   
Web Developer Resource DirectoryWebDev Jobs  
Animated GIFs
CSS
CSS Properties
Database
Design
Flash
HTML
HTML 4.01 Tags
JavaScript
.NET
PHP
Reference
Security
Site Management
Video
XML/RSS
WD Forums
 Client-Side
  Development

    HTML
    XML
    CSS
    Graphics
    JavaScript
    ASP
    Multimedia
    Web Video
    Accessibility
    Dreamweaver
    Expression Web

    General

 Server-Side
  Development

    PHP
    Perl
    .NET
    Forum, Blog, Wiki & CMS
    SQL
    Java ( JavaScript)
    Other

 Site Management
    Domain Names
    Search Engines
    Website Reviews

 Web Development
  Business Issues

    Business Matters

 Etc.
    The Coffee Lounge
    Computer Issues
    Feedback



Script Downloads
MD5 Algorithm

Featured: January 5, 2009
Description: The MD5 algorithm is a secure hash function. It takes a string input, and produces a fixed size number - 128 bits. This number is a hash of the input - a small change in the input results in a substantial change in the output. The functions are thought to be secure, in the sense that it would require an enormous amount of computing power to find a string which hashes to a chosen value. In others words, there's no way to decrypt a secure hash. The uses of secure hashes include digital signatures and challenge hash authentication.

Get Script

Hosting Search
Unix   Windows
PHP   Webmail

Sign up for the free WebDeveloper E-mail newsletter!


JupiterWeb Commerce
Partners & Affiliates
Partner With Us















internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

Web Log Analysis: Who's Doing What, When?

by Glenn Fleishman
Reprinted from Web Developer® magazine, Vol. 2 No. 2 May/June 1996 © 1996

Analyzing your Web site's traffic can be a fascinating study of how users traverse your pages, but it can also lead to information overkill. For example, every month my company logs about five or six times more bytes recording information about the visits to our clients' Web sites than the Web sites themselves actually contain! Ideally, you'd want to preserve this information in a form that combines analysis with rapid access.

A commercial product may not be ideal for many sites. The software can range in price from a few hundred to several thousand dollars for "local" analysis, where you run the software on one of your own machines feeding information to and from relational databases. You could also expect to pay up to thousands of dollars a month for services like NetCount or Internet Profiles (I/Pro) that require you to transmit the log files regularly--even hourly--to a remote site, where the analysis is conducted and reports generated.

Determining if a commercial product would benefit your company depends in large part on how highly customized an analysis you need. If your needs are fairly general, there are many simple shareware programs available for analyzing data locally. If you want to analyze unique visits in a comprehensive way, it makes sense to migrate to a commercial solution or invest in some significant in-house development.

This issue's column will show you how to do quick-and-dirty analyses using your own resources and the code samples provided. You'll also learn how to decipher datestamps for user analysis, and how to think algorithmically about generating site information. In the end, you may decide to opt for a commercial solution, but you'll do so in an informed manner.

Log Formats

There are several kinds of log formats, but here we'll exclusively address the Common Log Format (CLF), which can be used by most Web servers. Some servers, such as the Open Market series, Netscape servers, and the Microsoft Internet Information Server, log information in slightly different manners, but they function in basically the same way.

In addition to the fields shown in the CLF (see "Common Log Format"), there are several other useful pieces of information that your visitors' browsers may reveal, as I discussed in last issue's column. These can help you determine what kinds of users are visiting the site:

  • RefererURL: contents of REFERER_ URL client variable. This returns the URL of the last location the browser was before the user came to your site.

  • Client type: contains the contents of the HTTP_USER_AGENT client variable. This includes the browser's name, version, and user platform.

  • Cookie: contents of HTTP_COOKIE variable. These are persistent tokens defining a unique user that browsers and servers can pass back and forth across sessions. The cookie works only with certain browsers, but Netscape Navigator supports it fully, and other browsers do to a lesser extent.

Intersé Corp. is one commercial vendor that adds proprietary extensions to the CLF to include these three variables. The Intersé Extended Log Format uses the following names to refer to these three fields: "referer," "browser," and "cookie".

These three fields then appear in the Intersé Extended Log Count following the byte count, as in this example:

spaghetti.west.edu - - [30/Feb/1996:06:09:53 -0800]
"GET/film/reviews/D/dangerous.minds.horton.html HTTP/1.0" 200 3828
"http://search.yahoo.com/bin/search?p=dangerous+minds"
"Mozilla/1.22 (Windows; I; 32bit)" "211.63.0.255.8445454454"

At my company, we've modified the CERN 3.0 server to support this format, which is useful for other purposes as well.

What's in a Visit?

If you've ever tried to buy or sell advertising on the Web, you've run up against the question of what constitutes a unique impression on an ad, or, more generally, what constitutes a unique visit.

There can be a lot of confusion about these semantics; I propose the following terms based on common Net usage:

  • Visitor: a unique individual who can be tracked by registration or cookie.
  • Visit: a unique trip to a Web site, defined by a period of time during which a visitor browses the site.
So if you're asked "how many people really visit your Web site?" you can answer with a certain amount of confidence in terms of unique visits per day.

Sites that require or allow registration of users can use the LOGNAME variable in any log format to track unique visits in time (using datestamps) by known visitor (using the LOGNAME). If your site doesn't require visitor registration, there are two main options for tracking unique visits.

  • By cookie. If the server assigns cookies, as many as 90 percent or more of return visitors will have a cookie that can be logged (based on the popularity of the browsers that support cookies). Cookies can have expiration dates, so a choice can be made about the period of time over which data about specific users is logged. This doesn't tell you anything about an individual--unless you use a database to make a correspondence between cookies and registration information--but it does tell you about the unique visits by an unknown unique visitor.

  • By hostname or IP number. This is much less reliable, but for the majority of locations on the Net (especially online services) it will identify a unique simultaneous user. As noted in last issue's column, simultaneous users from America Online, Netcom, and other services will have unique hostnames or IP numbers. This method will only give you unique visits; there's no way for you to quantify unique visitors.

Tracking unique visits by cookie and by hostname/IP numbers both require timestamp analysis.

Visit Analysis

The algorithm for measuring unique visits and visitors is pretty straightforward; programming it is a damn sight more complex. At one point I attempted to build my own simple user analysis program; the flaw in the program is that it really requires some kind of DBMS.

The logic is sound, though. You want to break down the top-level information: log users, then log visits. The algorithm is essentially:

Analyze line of logfile
If there's a LOGNAME, find record associated
  with it
Otherwise if there's a cookie, find that
  record
Otherwise if there's a hostname, find that
  record
Otherwise use the IP number as the record
  index
Analyze datestamp
Is the most recent request within 30 minutes?
Yes: add duration and other info to record
No: log new visit for this user

Depending on how much detail you want, you can log successes and failures by HTTP header code (with 200 as a success, and other codes indicating other results); total bytes transferred; the duration of the visit; the pages visited; the path through to those pages; and so on.

[ Web Log Analysis: Who's Doing What, When?:
Part 2 > ]




Acceptable Use Policy

JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers