A Data Science Central Community
While importing the structured text files into the database using Java alone, we need to combine the SQL statements together manually, and to deal with various troublesome situations as well, like if the data in a table has been existed, whether we should update it or insert data into it, if some fields are included in the file, and if the fields in the file are consistent with those in the table.
As esProc participates in Java programming, these problems can be solved…Continue
Added by Lynn Guo on December 22, 2014 at 7:04pm — No Comments
There is a type of text files that they are too big to be entirely loaded into the memory, yet as the data have been sorted by a certain column and if they are imported in groups according to this column, they can be all put into the memory for computing. These text files include the call detail record of a telecom company, statistics of visitors on a website, information of members of a shopping mall, etc.
A great deal of complicated code, which is difficult to maintain, is…Continue
Added by Lynn Guo on December 15, 2014 at 6:24pm — No Comments
As Java doesn’t directly support dynamically parsing expressions in the text files, the computation can only be realized by splitting strings manually and then writing a recursive program. The whole process requires writing a great amount of code, is complicated and the code is difficult to maintain. With the assistance of esProc, we can develop program for the computation in Java without writing code manually. Let’s look at how esProc works through an example.
Here is a text…Continue
Added by Lynn Guo on December 10, 2014 at 6:30pm — No Comments
During developing the database applications, we often need to perform computations on the grouped data in each group. For example, list the names of the students who have published papers in each of the past three years; make statistics of the employees who have taken part in all previous training; select the top three days when each client gets the highest scores in a golf game; and the like. To perform these computations, SQL needs multi-layered nests, which will…Continue
Added by Lynn Guo on December 3, 2014 at 6:30pm — No Comments
In developing database applications, usually it is the records corresponding to the max/min value that we need to retrieve, instead of the value itself. For example, the occasion in which each employee gets his/her biggest pay raise; the three lowest scores ever got in golf; the five days in each month when each product gets its highest sales amount; and so on. As the max function of SQL can only retrieve the max value, instead of the records to which the max…Continue
Added by Lynn Guo on December 1, 2014 at 6:05pm — No Comments
Following problems will arise if you perform conditional filtering on text files in Java alone:
1. The text file is not a database,so it cannot be accessed by SQL. The code needs to be modified if filtering conditions are changed. Besides, if you want a flexible conditional filtering as that in SQL, you have to self-program the dynamic expression parsing and evaluating, resulting in a great amount of programming work.
2. Stepwise loading is required for the big files that…Continue
Added by Lynn Guo on November 23, 2014 at 6:00pm — No Comments
MongoDB can find out elements of a built-in array according to their indexes, but cannot find the indexes through the values of the elements. For example, the elements of an array are names of people stored according to their rankings. In MongoDB, names can be found according to the rankings (indexes of the array), but the values of rankings cannot be determined through names. esProc can help MongoDB in realizing this operation. The following example will teach you how it works in…Continue
Added by Lynn Guo on November 19, 2014 at 6:30pm — No Comments
Java doesn’t support set operations directly, so nested loops have to be used to realize the operations of intersection, union, complement and etc. between text files. If there are many text files, or the file to be computed is too big to be loaded into the memory, or it is required to perform set operations according to multiple fields, the code will become even more complicated. However, with the assistance of esProc, which supports set operations…Continue
Added by Lynn Guo on November 13, 2014 at 6:00pm — No Comments
Java doesn’t support set operations directly, so nested loops have to be used to realize the operations of intersection, union, complement and etc. between text files. If there are many text files, or the file to be computed is too big to be loaded into the memory, or it is required to perform set operations according to multiple fields, the code will become even more complicated. However, with the assistance of esProc, which supports set operations directly, Java can realize these…Continue
Added by Lynn Guo on November 11, 2014 at 12:00am — No Comments
esProc can help Java deal with various computations in processing structured texts. But in the case of non-single row records, it is necessary to preprocess the data before esProc can perform computations on it.
Let’s look at this through an example. The text file Social.txt is the access records of a website, in which every three rows corresponds to a record. The records should be rearranged first before other computations can be performed. They should be imported in the form…Continue
Added by Lynn Guo on November 4, 2014 at 8:30pm — No Comments