lice! : Quick & dirty visitor stats for rails

Quick & dirty visitor stats for rails

Hey everyone, I am sure there are hundreds of nice ways to keep track of how many people visited a rails applications. From log analyzers for apache logs, over free web counters to self written code or gems that are floating through the net. And i agree all of them are nice, I currently use the neat tool from google which has a lot of features. So in the first days before I had set it up I was looking for just a simple way to see how many hits the webside had without any nifty graphics, html output or referrer analysis. I was surprised not find anything about it on google right ahead. Well a solution had to be found and as a ruby fan surely it included ruby. The original code I tossed together after a minute or two where (all assuming you’re in the logs dir of your rails app): cat production.log | grep "^Processing" > out.tmp && ruby -e 'data = Hash.new;File.open("out.tmp").each{|l| data[ip = l.split(" ")[3]] = (data[ip] || 0) + 1}; puts data.length' Now it is ugly, using files and all this but it worked which were the main idea behind it (quick and dirty), very dirty I must admit so. A few days later I talked with a friend over the topic and we came up with a lightly nicer way to do thigs: cat production.log | grep "^Processing" | ruby -e 'ips = Hash.new(0); while gets; ips[$_[/\s.*?\s.*?\s(\S+)/,1]] += 1; end; ips.length' While this already was a nice solution (and terrible ugly to read) it still worked and gave the number of hits just as the first line of code. The basic idea in both of this is to create a hash of IP’s that accessed the website and then return the length of the hash. Nice, both of them so in the end there is a much simpler what without the use of ruby (as much as I love the language) so here what came out after a bit more pondering, it don’t uses ruby any more just console commands: cat production.log | grep "^Processing" | awk '{print $4}' | sort -u | wc | awk '{print $1}' A short in detail description of what happens. `cat production.log` gets the data of the content of the logfile itself the `|` pipes the output to the next command (like one would write to the STDIN). The next step is to filter the log `grep “^Processing”` takes care of this it only leaves lines that start with the word Processing which would be (I masked the IP from that line): Processing ArticlesController#permalink (for xxx.xxx.xxx.xxx at 2006-08-01 20:53:29) [GET] The following command `awk ‘{print $4}’` gets the 4th word of the line, which in this case would be the IP of the whole request. Now we have a list of all IP’s that ever accessed the site. As there are still multiple lines from the same IP in there we need to filter them. `sort -u` takes care of this it sorts the list and remove double entries, like twice then same IP. Now all that is left is to count the lines in this file, `wc` is our friend here, it returns 3 values while the second and third don’t really interest is the first does so again `awk` will fetch the word of the resulting line we want. The last command in the chain is `awk ‘{print $1}’`. Done. Enjoy it or not ;) most other tools are more precise, accurate and give more information still it is a quick way to get a rough idea and I found good use of it.

Posted by Heinz N. 'Licenser' Gies Fri, 11 Aug 2006 20:59:00 GMT


Trackbacks

Use the following link to trackback from your own site:
http://blog.licenser.net/trackbacks?article_id=10

Comments

  1. Avatar
    Sheldon Hearn about 1 hour later:
    For low volume sites, you could try sitemeter.com. I stumbled over them while trying to set up site stats for my blog. I'm now using their commercial offering for a customer's site and I'm impressed. Wicked JavaScript tricks for tracking what links people click on that have them leave your site, that sort of thing.
  2. Avatar
    Heinz N. 'Licenser' Gies about 1 hour later:
    Thanks, I'll certenly look into that. I set up a Google Analytics acout myself as the featers are just nice (and as slow as my server loads the sites it makes no difference;). But alone to see what else is around. So again the line shell code was jsut a quick trick to get a general idea ;)