Subscribe to DSC Newsletter

Improve the Computation and Analysis Ability in Java

Java is the most widely used programming language with an outstanding architecture. It is the top preferred language to develop the enterprise application. However, Java is not fit for the mass data computation. If encountering the computation that is too complex to be represented in a single SQL statement or it is not allowed to add stored procedures to database, then drawbacks of Java would be highlighted.

Hereby is a case on how to deal with above questions with JAVA. 

Case Description

In an information system of Java architecture in a certain merchant bank, it is required to compute the Moving Average (MA) of the monthly loan amount in this year. The row data is stored in the loan table of a database, mainly comprising the field time (i.e. date of loan), and the field amount (i.e. amount of loan). Some data is shown as given in the below table:

Algorithm of Moving Average: average of the 3 neighboring months, for example, the Moving Average in March is February, March, and April. In addition, the first record and the last record only have 2 neighboring months.

It is easy to compute the monthly loan amount because you are only required to group the data of loan by month via SQL and then sum up the amount of loan. However, the Moving Average involves the “Relative Position” and “Computation between Rows”. Therefore, it is not easy to represent via SQL. In addition, adding stored procedures to the database of bank is under strict control. Usually, such computation is implemented with Java.

Java Solution

First, use SQL to perform the group and sum operations. Then, use Java to compute Moving Average.

In Java solution, loops and traversals across the result sets are required to handle the below three situations: 1. Regarding the data from the first record, only perform the average operations on the current and the next records; 2. Regarding the data from the last record, only perform the average operations on the current and the previous records; 3. Regarding the other situations, only perform the average operation on the current record, previous record, and the next record.

Java is the traditional practice. Now we can try a new alternative.

A script plus Java Solution

A

1

=db.query(“select sum(amount)amount,to_char(time,’MM’) month from loan where to_char(time,’yyyy’)=to_char(sysdate,’yyyy’) group by to_char(time,’MM’) order by month”)

2

=A1.derive(~{-1,1}.(AMOUNT).avg())

3

result A2

Firstly, compute in the esProc IDE as follows:
       In A1, execute the simple summarization in groups via SQL;
       In A2, proceed with the computation and compute the Moving Average.
       In this procedure, ”~” represents every record in the group, and ”{startPosition,endPosition}” represents the relative scope of range; ”{-1,1}” represents the range from the -1 position to the 1 position (3 records in total) relative to the current record.
       We save the above cellsets as ”ma3.dfx” and call the computation in Java:

Java application can call esProc just like calling the stored procedure through the standard JDBC interface. In which, the name of stored procedure is just the cellset name.

As we can see in this case, if using Java alone to solve this problem, then the computation logic is complex, the workload of coding is heavy, and it is not easy for code maintenance. For example, to compute the Moving Average of the 5 months in a row, or the Moving Average of any N months, the developer will have to code all over again to solve this problem. In fact, in order to facilitate the use, more codes are required to convert the type of current returns to ”List+Map” or ”ResultSet”.
The script solution provides the agile syntax and powerful library functions to solve such problem easily. In addition, it is easy for scaling and maintenance, such as computing the Moving Average of the five months. The developers are only required to change the range of data in A2 from {-1,1} to {-2,2}. If computing the Moving Average of the N months, the developer can simply replace the range with the input parameters. Finally, the result set itself of esProc is the easy-to-use ”ResultSet” type that is easy for Java application to directly use it.
So, cooperation of JAVA and script solution could uplift the computational capacity of Java greatly.

 

Views: 303

Tags: JAVA, SQL, analysis, computing, data

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service