A Data Science Central Community
This post brings forth to the audience, few glimpses (strictly) of insights that were obtained from a case of how predictive analytic's helped a fortune 1000 client to unlock the value in their huge log files of the IT Support system. Going to quick background, a large organization was interested in value added insights (actionable ones) from thousands of records logged in the past, as they saw both expense increase at no higher productivity.
As, most of us know in these business scenarios end-users will be much interested in out-of-knowledge, strange and unusual things that may not be captured from regular reports. Hence, here data scientist job not only ends at finding un-routine insights, but, also needs to do a deeper dig for its root cause and suggest best possible actions for immediate remedy (knowledge of domain or other best practices in industry will help a lot). Further, as mentioned earlier, only few of those has been shown/discussed here and all the analysis has been carried out using R Programming Language components viz., R-3.2.2, RStudio (favorite IDE), ggplot2 package for plotting.
The first graph (below one) is a time series calendar heat map adopted from Paul Bleicher, shows us the number of tickets raised day-wise over every week of each month for the last year (green and its light shades represent less numbers, where as red and its shades represent higher numbers).
Herein, if one carefully observe the above graph, it will be very evident for us that, except for the month of April & December, all other months have sudden increase in the number of tickets raised over last Saturday's and Sunday's; and this was more clearly visible at Quarter ends of March, June, September (also at November which is not a Quarter end). One can think of this as unusual behavior as numbers raising at non-working days. Before, going into further details, lets also look at one more graph (below), which depicts solved duration in minutes on x-axis and their respective time taken through a horizontal time line plot.
The above solved duration plot show us that out of all records analyzed 71.87% belong to "Request for Information" category and they have been solved within few minutes of tickets raised (that's why we cannot see a line plot for this category as compared to others). So, what's happened here actually was a kind of spoof, because of lack of automation in their systems. In simple words, it was found that there doesn't exists a proper documentation/guidance for many of applications they were using; such situation was taken as advantage for increasing the number of tickets (i.e. nothing but, pushing for more tickets even for basic information in the month ends and quarter ends, which resulted in month end openings which in turn forced them to close immediately). Discussed one here is one of those among many which has been presented with possible immediate remedies which can be easily actionable.