Subscribe to DSC Newsletter

Data alignment join in Java for easier text analytics

The join statements of the database can be used conveniently to perform the operation of alignment join. But sometimes the data is stored in the text files, and to compute it in Java alone we need to write a large number of loop statements. This makes the code cumbersome. Using esProc to help with programming in Java can solve the problem easily and quickly. Let’s look at how this works through an example.

The text file emp.txt contains employee information, except that in which EId is 1. Anothertext file sOrder.txt contains information of sales orders in which field SellerId corresponds to field EId in emp and from which the information whose SellerId is 2 is excluded. Part of the original data is listed below:

         emp.txt

sOrder.txt:

It is required to join the three fields: Name, Dept and Gender, in emp to sOrder in alignment and output the computed result to a new file. The expected result is as follows:

Code written in esProc:

In cells A1 and A2 respectively, data is imported from the two text files and stored in two variables: emp and sOrder. Here import function uses tab as the column separator by default. Option @t represents the first row will be imported as the field names. Because only some of the fields in emp.txt are needed, the code in A1 uses the names of these desired fields as parameters. After execution, values of emp and sOrder are as follows:

In the code in A3: [email protected](sOrder:s,SellerId;emp:e,EId), join function performs the operation of alignment join and changes the names of the two tables to s and e respectively. Option @1 represents the left join which is in line with the requirement of the example: join emp to sOrder in alignment. The computed result is as follows:

Click the numbers in blue and we can see the detailed information, as shown below: 

esProc can also be used to realize the right join which only requires exchanging positions of data in alignment. For example, to align sOrder according to emp, we just need to exchange their positions in the code, that is, [email protected](emp:e,EId;sOrder:s,SellerId). The computed result is as follows: 

It is also easy to realize the full join using option @f. The code is [email protected](sOrder:s,SellerId;emp:e,EId). The computed result is as follows:

There are altogether four operations of alignment join: left join, right join, full join and inner join. By default, join function is used to execute the inner join, the code is =join(sOrder:s,SellerId;emp:e,EId). The computed result is as follows:

Let’s get back to the example. The code in A4: =A3.new(s.OrderID, s.Client, s.SellerId, s.Amount, s.OrderDate, e.Name, e.Dept, e.Gender), is for getting the desired fields from table e and creating a new structured two-dimensional table. The computed result is as follows:

Now the alignment is done and data needs to be exported to a new file. The code for this is =file("E: \\result.txt")[email protected](A4). In export function, tab is by default the column separator and option @t represents the field names are exported to the first row. Open result.txt and we can see information as follows:

The script in the above has finished exporting all aligned data to the new file, what we will do next is to call the script in Java.

         // connect esProc to jdbc

         Class.forName("com.esproc.jdbc.InternalDriver");

         con= DriverManager.getConnection("jdbc:esproc:local://");

         // call esProc script; the name of the script file is test

         st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");

         // execute esProc stored procedure

         st.execute();

By executing the above Java code, emp will be joined to sOrder in alignment and the result will be output to file result.txt.

If the requirement is changed to this: query data in sOrder according to dynamic periods of time, execute the same operation of alignment join and return the result directly to Java. To complete the task esProc needs to define two parameters: begin and end, to represent starting time and ending time respectively. The esProc code is as follows:

The code in red has been modified.      

A2:Filter sOrder again using select function according to the starting and ending time passed from Java, that is, @begin and @end.

A5:Output the computed result in A4 to JDBC interface.

And Java code should be modified too to pass parameters to esProc code and get the final result. The modified code is as follows:

Class.forName("com.esproc.jdbc.InternalDriver");

         con= DriverManager.getConnection("jdbc:esproc:local://");

         st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?,?)");

         st.setObject(1,startTime);

         st.setObject(2,endTime);

         st.execute();

         ResultSet set = st.getResultSet();

 

Views: 943

Tags: Java, alignment, analytics, data, file, in, join, left, table, text

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service