Friday, March 7, 2014

clickstream analysis with Splunk

Analyzing logfiles from webservers is a common use case in Splunk. Usually people are counting page clicks, analyzing status codes and so on. 

A while ago, I was asked if it would be possible to extract and analyze a user path with Splunk by means of our raw webserver log files. My short answer was "yes of course". But then it took me some time to get my final search working correctly.

The result looks quite simple.. here it is


base search

define your initial search and extract the two fields "page" and "client_id". This should look something like this

> index=your_weblogs sourcetype=your_st | 
  fields page client_id


hint: I've put my base search in a macro called "base_search" so that I can reuse it in the following searches. 

current page and next_page

With our base search at hand, let's extract the current page and next_page attributes per event and per client

> `base_search`| 
   streamstats current=f last(page) as next_page by client_id 

Finally, let's count the occurrence of equal paths

> `base_search`| 
  streamstats current=f last(page) as next_page by client_id |
  stats count(next_page) as count by page next_page

This should give you a table with all paths from page to next_page for all users. If you'd like to analyze one specific page, just append a search query

To see where people are mostly going to after the "login_page":

> `base_search`| 
  streamstats current=f last(page) as next_page by client_id  |
  stats count(next_page) as count by page next_page |
  search page="login_page"

To see where people are mostly coming from, before they reach the "payment" page

> `base_search`| 
  streamstats current=f last(page) as next_page by client_id  |
  stats count(next_page) as count by page next_page |
  search next_page="payment"


visualize with sankey

A possible way to visualize the results would be a d3.js sankey diagram (http://bost.ocks.org/mike/sankey).
You can include it in your Splunk dashboard and get a pretty nice but still simple path analytics page with Splunk..





A blog entry and code samples to show how to include sankey in a Splunk dashboard will follow.. post a comment if you'd like to see it soon.

4 comments:

  1. Thank you for the useful post! It would be great to see the continuation about including sankey in a Splunk dashboard.

    ReplyDelete
    Replies
    1. Didn't have much time to update the post.
      Have a look at: http://apps.splunk.com/app/1772/ I've implemented d3.js sankey and sunburst for clickstream analysis in a Splunk6 app, together with other SimpleXml examples

      Hope this helps and I'll hopefully be back soon to update the post..

      Delete
    2. IntelliMindz is the best IT Training in Bangalore with placement, offering 200 and more software courses with 100% Placement Assistance.


      Splunk Online Training
      Splunk Training In Bangalore
      Splunk Training In Chennai


      Delete
  2. Very useful tutorial on clickstream analysis. Thanks for sharing. Keep up the good work. Hope it helps community here: Splunk Training

    ReplyDelete