# AnalyticBridge

A Data Science Central Community

Excel® is the most widely-used spreadsheet tool. The nontechnical persons like to use it for computation and analysis. But they usually find the formulas and functions available in Excel® are rather poor and the VBA® is just double Dutch to them for further analysis. Thus, a huge volume of data with valuable information has been wasted in vain.

Hereby introduce a better solution to realize the utmost value of Excel® by taking the esProc advantages of powerful computation ability, agile and easy-to-use analysis style, and programmed running mode.

I Case and Comparison

## Description

In an advisement agency, a Sales Director receives Client Reports from eight Regional Account Managers by every quarter. The Client Report is an Excel® spreadsheet, mainly comprising the client, sales value, and other information about the respective region, as given in the below figure: Suppose that the Sales Director wants to compare the big client across various regions. For example, regarding the client of whom the sales value ranks top 10%, 20% or 30%, what’s the average sales, and which enterprise is among the Top 500?
First, let’s have a look at the attempt to solve it with Excel® formulas.

## Excel® Formula Solution

The first step is to compute the average sales of clients whose sales values rank top 10%. We may adopt the following procedure: firstly, sort the sales value in descending order, and then use count( ) function to compute the total number of clients. Secondly, multiple the total number by 10%, and round the result to get the row number with the round ( ) function. Finally, copy these clients onto a new spreadsheet, and compute the average value. This procedure is not difficult for those who are familiar with Excel®.

Then, let’s proceed with this computation: How many of these big clients are among Top 500? To solve the problem, you need to get the intersection of the two datasets. In other words, this is to compute the common part of big client set and the Top 500 list from the previous step. The computational expression is:

=INDEX(A:A,SMALL(IF(COUNTIF(\$B\$2:\$B\$15,\$A\$2:\$A\$20),ROW(\$A\$2:\$A\$20),4^8),ROW(A1)))&”"

Since the above formula requires 5 various combinations of functions, it is a great challenge to compose it.

To make it worse, the computational procedure of Excel® requires the user to carry out manually and only acceptable for the specific Excel® spreadsheet. This is not as universal as a program, for example, if program, the whole computational procedure will rerun automatically on receiving different file names. In this case, there are 8 Excel® files. Excel® users will have to run the computational procedures for 8 times. Moreover, there are 3 rankings: 10%, 20%, and 30%, which means the computation will have to be repeated for 8X3=24 times.
It is obvious that it is too tough to solve this problem with Excel® formula.
Let’s try VBA®, the most powerful extension tool of Excel®.

## Excel VBA® Solution

Function Collection(a As Range, b As Range)
On Error Resume Next
Dim arr1(), arr2(), times, tmpindex
Set newcoll = CreateObject(“Scripting.Dictionary”)
With Application.WorksheetFunction
arr1 = .Transpose(a.Value)
arr2 = .Transpose(b.Value)
Do
times = .Mode(arr1, arr2)
If IsEmpty(times) Then
Exit Do
Else
tmpindex = .Match(times, arr1, 0)
arr1(tmpindex) = arr1(UBound(arr1))
If UBound(arr1) = 1 Then
arr1(1) = Empty
Else
ReDim Preserve arr1(1 To UBound(arr1) – 1)
End If
tmpindex = .Match(times, arr2, 0)
arr2(tmpindex) = arr2(UBound(arr2))
If UBound(arr2) = 1 Then
arr2(1) = Empty
Else
ReDim Preserve arr2(1 To UBound(arr2) – 1)
End If
times = Empty
End If
Loop
End With
arr3 = newcoll.keys
If newcoll.Count = 0 Then
Collection = False
Else
Collection = arr3
End If
End Function
Isn’t it unreadable and indigestible? The above “double Dutch” is only one step to compute the intersection set of several steps. Undoubtedly, VBA® needs a great programming capability and is by no means suitable for nontechnical persons.
Then, let’s check the impressive esProc solution below.

## A better Solution

 A B 1 =file(rangeFile)[email protected]() 2 =A1.sort(amount:-1) =round(A2.count()*percent) 3 =A2(to(B2)) 4 =A3.(amount).avg() 5 6 =file(“C:\\500.xls”)[email protected]() 7 =A3.(customer)^A6.(500Name) 8 =A7.count()

A1 and A6: Retrieve the “Client Report” of a certain region respectively and “Top 500 list”. Please note that “rangeFile” is a parameter, and you can assign various file names to get various results. In addition, the “percent” in the B2 is also a parameter, for example, 10%, 20%, and 30%.
A2: Sort the data in A1 by sales value. The ”amount” is column name retrieved automatically, and the ”-1” represents the descending order.
B2: Compute the row number of clients ranking the top 10%, 20% or 30% respectively.
A3: Compute the clients from row 1 to B2 that are all big clients. Assume that B2 equals to 3, then “to(B2)” equals to ”1,2,3”.
A4: Compute the average sales value of big clients
A7: Compute the clients which not only big clients but also among the Top 500. In other words, this is to compute the intersection set of the customer column from the Client Report and the 500Name column from the Top 500. The ”^” represents the intersecting action.
A8: Compute the number of clients in the intersection set from the previous step.
As we can see, the style of esProc expression is similar to that of Excel®, agile and intuitive but more powerful in computing and capable to rerun just as a program does. It is a great analysis tool to empower the nontechnical persons who are familiar the style of Excel®.

Views: 680

Comment

Join AnalyticBridge Comment by Daisy Ding on July 12, 2012 at 7:17pm

I'm also quite appreciated for your further questions or advices so that we can do more for those Excel lovers! Nice weekends! Comment by Vita Zhang on July 12, 2012 at 3:49am

well, that's really great. Comment by Daisy Ding on July 11, 2012 at 3:54am

It supports all major versions, and could run normally on data over 2 billion lines, of course, it has to be based on your huge enough memory Comment by Vita Zhang on July 10, 2012 at 1:52am

how about the data volume? For example, does it support excel 2007 or data over 65,535 lines?