FREE THOUGHT · FREE SOFTWARE · FREE WORLD

Home » Google » Speed Up Google Analytics with urchin.js

Speed Up Google Analytics with urchin.js

Speed Up Google Analytics with urchin.js

September 14th, 2007

Update: This article is a bit dated, to get the newest and latest Google Analytics Speed Tips, check out the 3 part update about Hosting, Cookie Domains, and JavaScript Loading and Failproofs


Ever notice that sometimes your sites take a while to load all the way because google's urchin.js file is taking forever? You may recognize this problem when you see something similar to this in your browsers status bar "Transferring data from google-analytics.com..."


Time To Setup?

4 minutes I got tired of seeing that all the time and so I set up an automated cronjob that runs every 12 hours and downloads the latest version from google, and saves it to my website directory, then I reference /urchin.js instead of http://www.google-analytics.com/urchin.js.. and my site loads a lot faster! Take a look at the source for this page if you want to see what is going on (look at the bottom)

There are 2 pretty major things that you accomplish by hosting urchin.js locally

  1. You Enable persistant connections
  2. You ensure that the correct 304 Not Modified header is sent back to your site visitors instead of reserving the entire file.

The urch.sh shell script

Create a shell script called urch.sh.. this will be executed every 1/day or 1/wk, whatever you want. The following script removes the local urchin.js, then it wgets the latests version into the local directory.
#!/bin/sh
rm /home/user/websites/askapache.com/z/j/urchin.js
cd /home/user/websites/askapache.com/z/j/
wget http://www.google-analytics.com/urchin.js
chmod 644 /home/user/websites/askapache.com/z/j/urchin.js
cd ${OLDPWD}
exit 0;

Improved shell script

I realized right away that a more modular shell script would be needed.. I admin like 50+ web-sites and it would be stupid to have to type the same block of code 50 times, wget the same file 50 times, etc.. So this version downloads the urchin.js file into a temporary directory, then it copies it OVER the old file for each directory. So 1 GET request for unlimited sites.. The below is just for 2 sites.

#!/bin/sh
 
# SETTINGS
export TMP=${HOME}/sites/tmp/
export SH=${HOME}/sites/
export SJ=/public_html/z/j/
export UA="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3"
 
# SITES
export S1=htaccesselite.com
export S2=askapache.com
 
# RESOURCE URLS
L1=http://www.google-analytics.com/urchin.js
 
# SETS CORRECT PERMS AND COPIES TO EACH SITES DOC_ROOT
setit(){
chmod 644 *.js
cp *.js $SH$1$SJ
}
 
cd $TMP
 
curl --header "Pragma:" -A "${UA}" --retry 200 --retry-delay 15 -O ${L1}
 
setit $S1
setit $S2
 
cd ${OLDPWD}
 
exit 0;

The crontab

Add this to your crontab by typing crontab -e
11 12 * * * /home/user/websites/urch.sh >/dev/null 2>&1
Or to just check once a week do
0 2 * * 6 /home/user/websites/urch.sh >/dev/null 2>&1
Finished! Read on for more in-depth overkill. In the past year urchin.js has only been updated once, yet the Last-Modified header reflects an updated date every request.. not even in a linear fashion I might add! The problem happens when requests for the urchin.js file on google-analytics.com spike, even with load-balancing technologies which are obviously in place. When this happens your browser makes the request for the urchin.js file, but instead of an immediate connection and transfer of the file you get a lagging transfer. One reason is because the server that the urchin.js file is served from does not allow persistant connections.
This object will be fresh for 1 week. It has a validator present, but when a conditional request was made with it, the same object was sent anyway. It doesn't have a Content-Length header present, so it can't be used in a HTTP/1.0 persistent connection.
Another big big reason is that even though Cache-Control headers are correctly set by google-analytics when serving urchin.js, Instead of responding to an If-Modified-Since header correctly with a 304 Not Modified header, indicating the file has not been modified, google-analytics instead returns the entire urchin.js file again, thus rendering the cache-control void. You can see this problem with a wireshark capture from an exchange.
GET /urchin.js HTTP/1.1
Accept: */*
Referer: http://www.askapache.com
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
If-Modified-Since: Tue, 20 Mar 2007 22:49:11 GMT
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SU 2.011; .NET CLR 1.1.4322; .NET CLR 2.0.50727; Alexa Toolbar; .NET CLR 3.0.04506.30)
Host: www.google-analytics.com
Connection: Keep-Alive
HTTP/1.1 200 OK
Cache-Control: max-age=604800, public
Content-Type: text/javascript
Last-Modified: Tue, 20 Mar 2007 22:54:02 GMT
Content-Encoding: gzip
Server: ucfe
Content-Length: 5675
Date: Sat, 24 Mar 2007 18:23:12 GMT
Of course you should implement your own caching scheme for best results.

So there are 2 pretty major things that you can eliminate by using a locally hosted version of the urchin.js file.

  1. Enable persistant connections
  2. Send correct 304 Not Modified headers
Still, this issue only becomes an issue if you notice lags from the google-analytics site, which happen from time to time.

Other Speed Tips

To truly speed up your site, you will want to implement a server-side caching technique. Or you can read more about caching and web cache.

Apache Caching Guide

This document supplements the mod_cache, mod_disk_cache, mod_mem_cache, mod_file_cache and htcacheclean reference documentation.

It describes how to use Apache's caching features to accelerate web and proxy serving, while avoiding common problems and misconfigurations.


Please Comment!

Your email address is not saved.

Reader Comments

  1. DaveA ~December 9, 2009 @ 6:16 pm

    urchin.js is evil. The more I review this with proxies, the more I hate it.

  2. vahur ~September 15, 2008 @ 8:41 am

    A good (fast) idea in addition to host ga.js in your own server is to merge multiple .js files together so that there will be less http requests.
    I have multiple .js files, and after editing some of them i merge multiple files together and compact the contents so that the original source remains untouched.

  3. Majika ~January 26, 2008 @ 10:39 am

    Hi,

    First off great work, This is a bit of a n00b! question as I am crap at Cron Jobs and shell scripts all together (kinda wonder how I have a website to begin with) well anyway,

    I am a bit baffeled about the <pre/website/ bit in the first of your shell script examples

    #!/bin/sh
    rm /home/user/websites/askapache.com/z/j/urchin.js
    cd /home/user/websites/askapache.com/z/j/
    wget http://www.google-analytics.com/urchin.js
    chmod 644 /home/user/websites/askapache.com/z/j/urchin.js
    cd ${OLDPWD}
    exit 0;

    I think I would fill the script out like this Please tell me if this is correct ?

    #!/bin/sh
    rm /home/sellitor/websites/site/z/j/urchin.js
    cd /home/sellitor/websites/site/z/j/
    wget http://www.google-analytics.com/urchin.js
    chmod 644 /home/sellitor/websites/site/z/j/urchin.js
    cd ${OLDPWD}
    exit 0;

    is this correct? or do I have to change the '/websites/' to one of my sub-domains whichever one I am using Google Analytics with ?

    for instance I would place '/shop/' in place of '/websites/'

    Cheers,

  4. AskApache ~November 26, 2007 @ 3:05 pm

    @ Phillip

    Thats a very good method and will work for most people, but I personally am kinda fanatical about security, so that means each of my sites have their own username, usergroup, and are totally separated from one another, so if one falls, the others don't go like dominoes.

  5. Phillip ~November 23, 2007 @ 10:31 am

    Neat idea on the cron job. Wouldn't it be simpler to symlink to one copy of urchin.js from all of your project directories? Then no matter how many sites you have using it, you don't have to change your shell script. I do that a lot with code that I want to share across multiple sites. I very much dislike having to make the same change in multiple places.

    Just a thought.

    Peace.

  6. Ray ~October 8, 2007 @ 3:55 pm

    Where is it getting the values for things like:
    {OLDPWD}
    and
    {HOME}

    Thanks

  7. AskApache ~October 3, 2007 @ 11:12 pm

    @ brad

    Yo, you could in fact combine urchin.js with your sites other .js file. Check out my apache.js file to see how much you can cram into 1 file. Once a browser goes to any page on my site 1 time, they will never have to request that file again, it is cached forever and they dont even send If-Modified or Compare E-Tags because I removed the Last-Modified and E-Tags from my .js file. So what if I update the .js? I just rename the .js file in my .html (which is only cached for 7200 seconds and then DOES revalidate) and presto! My site is sooo fast!

  8. AskApache ~August 28, 2007 @ 5:37 pm

    I am thrilled to learn that some genius over at Drupal has already created some really interesting code to accomplish this task in a simpler way. Very cool, check it out

  9. Brad ~July 23, 2007 @ 3:23 pm

    Not to be a pain, but did you consider the caching benefits had by serving the file from Google's servers? Once you visit one site that loads http://www.google-analytics.com/urchin.js it is in the browser's cache for subsequent hits to that site and even when hitting other sites that also use Google Analytics.

    This is similar to the reason that Yahoo hosts their YUI library files on Yahoo servers for all sites to link to and benefit from caching. With this scheme in place the most a browser should have to do is check the freshness of the file (and usually receives an HTTP 304).

    Regardless, Google Analytics should be loaded after all of your core site content files and thus shouldn't impact apparent load time for users in the least. The stated need for additional speed is purely pedantic on the part of those who watch the status bar ("Transferring data from google-analytics.com") -- hint: less than 10% of normal people.

  10. bigtime ~March 28, 2007 @ 4:16 pm

    In your shell script it may be a good idea to wget the js file first with a different name, force a copy over the existing js file and then remove the temporary download file. This way you don't risk getting caught without a js file during one of those google "lag" times.

    for example...

    #!/bin/sh
    cd /home/user/websites/askapache.com/z/j/
    wget -O urchin.latest.js http://www.google-analytics.com/urchin.js
    chmod 644 /home/user/websites/askapache.com/z/j/urchin.latest.js
    cp --force urchin.latest.js urchin.js
    rm /home/user/websites/askapache.com/z/j/urchin.latest.js
    cd ${OLDPWD}
    exit 0;
  11. Adam Dempsey ~March 22, 2007 @ 5:30 am

    Very nice idea! Is this within Analytics TOS though?


It's very simple - you read the protocol and write the code. -Bill Joy

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License, just credit with a link.
This site is not supported or endorsed by The Apache Software Foundation (ASF). All software and documentation produced by The ASF is licensed. "Apache" is a trademark of The ASF. NCSA HTTPd.
UNIX ® is a registered Trademark of The Open Group. POSIX ® is a registered Trademark of The IEEE.

Site Map | Contact Webmaster | Glossary | License and Disclaimer | Terms of Service |

↑ TOPMain