A Data Science Central Community
As R programming language becoming popular more and more among data science group, industries, researchers, companies embracing R, going forward I will be writing posts on learning Data science using R. The tutorial course will include topics on data types of R, handling data using R, probability theory, Machine Learning, Supervised – unSupervised learning, Data Visualization using R, etc. Before going further, let’s just see some stats and tidbits on data science and…Continue
Data environments are growing exponentially. IDC reports that compound annual growth in data through 2020 will be almost 50% per year. Not only is there more data, but there are more data sources. According to Ventana Research, 70% of organizations need to integrate more than 6 disparate data sources. At the same time, the value of unlocking that data and using it to make business decisions is also increasing. For the business user, understanding this complex data and unlocking its potential…Continue
Added by Eran Levy on November 24, 2015 at 12:00pm — No Comments
Recently I have come across a term, CRISP-DM - a data mining standard. Though this process is not a new one but I felt every analyst should know about commonly used Industry wide process. In this post I will explain about different phases involved in creating a data mining solution.
CRISP-DM, an acronym for Cross Industry Standard Process for Data Mining, is a data mining process model that includes commonly used approaches that data…
This blog written by Adrash Matthew is an outcome of a discussion on the current state of data systems in India and the case for an audacious overhaul in light of ‘Digital India’ initiative with Mr TCA Srinivasa Ragavan. Mr Raghavan is a Senior Associate Editor in the Hindu Business Line. He has been a consultant to the Reserve Bank of India's history project, advisor to the Director of ICRIER. He is also a Distinguished Fellow of the Institute of Peace and Conflict…Continue
Added by Athena Infonomics on October 17, 2015 at 2:00am — No Comments
Data lineage is about tracking the flow of information. It is necessary to guarantee the quality, usability and security of your data. For large organizations, it is also a key conformity requirement. With Linkurious, it is possible to use a graph-based approach to solve these challenges.
The success of an organization depends on the quality, usability and security of its data. Want to provide amazing support to your customer? Create new products…Continue
Added by Jean Villedieu on October 13, 2015 at 1:47am — No Comments
NASA is using big data to make complex knowledge more readily available. Learn how graph visualization can help turn large corpus of documents into concrete insights.
Even in a mature and knowledge-driven organization like NASA, finding an answer to a common business issue can be frustrating. Past surveys at NASA have shown that most people have trouble finding the…Continue
Added by Jean Villedieu on August 31, 2015 at 7:14am — No Comments
Added by Athena Infonomics on August 31, 2015 at 12:00am — No Comments
DFS, ''the next big thing'' is taking North America by storm and slowly knocking on Europe's doors. The way it works is simple: sports lovers select a team of real world athletes…Continue
Added by Jure Rejec on July 28, 2015 at 7:00am — No Comments
The job of a data analyst nowadays has become very extensive, in its need to cover a number of different and ever-changing tasks.
A data analyst must query a variety of internal and external data sources, each with a different access protocol and format; integrate these data with results from REST and web services queried over the Internet, such as Google API or any social media channel; exchange information with business analysts, who, while lacking the deep mathematical background,…Continue
Added by Rosaria Silipo on July 14, 2015 at 2:27am — No Comments
Random Forest is a machine learning algorithm used for classification, regression, and feature selection. It's an ensemble technique, meaning it combines the output of one weaker technique in order to get a stronger result.
The weaker technique in this case is a decision tree. Decision trees work by splitting the and re-splitting the data by…Continue
Added by Alex Woods on July 4, 2015 at 8:30am — No Comments
In 2012, Harvard Business Review cited data scientist as the sexiest job of the 21st century. Just two months ago LinkedIn shared the “25 Hottest Skills that Got People Hired in 2014” – guess what type of workers possessed these skills? This attention has been followed with a slew of articles telling budding analysts the skills they’ll need to get to the top of the data scientist food…Continue
Added by Elana Roth on June 29, 2015 at 3:00am — No Comments
We all know that time is money, especially when you're paying a data scientist. But the New York Times reports that...
"Data scientists, according to interviews and expert estimates, spend 50 percent to 80 percent of their time mired in [the] mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets."…Continue
Added by Rosaria Silipo on May 6, 2015 at 12:30am — No Comments
"Analytics is a journey and not a destination!! It takes considerable effort to frame that journey and execute it with a sense of purpose. You will encounter stumbling blocks that may threaten your initiative but you need to find a way out and keep marching ahead."
We did an analytics exercise for a US client recently in education domain that had all the flavors of roadblocks…Continue
Why use big data tools to analyse web analytics data?
Because, Web event data is incredibly valuable. It tells us how our customers actually behave (in lots of detail), and how that varies. We can also do analysis between different customers or for the same customers over time.…Continue
Added by VINU KIRAN .S on January 31, 2015 at 5:19am — No Comments
While importing the structured text files into the database using Java alone, we need to combine the SQL statements together manually, and to deal with various troublesome situations as well, like if the data in a table has been existed, whether we should update it or insert data into it, if some fields are included in the file, and if the fields in the file are consistent with those in the table.
As esProc participates in Java programming, these problems can be solved…Continue
Added by Lynn Guo on December 22, 2014 at 7:04pm — No Comments
There is a type of text files that they are too big to be entirely loaded into the memory, yet as the data have been sorted by a certain column and if they are imported in groups according to this column, they can be all put into the memory for computing. These text files include the call detail record of a telecom company, statistics of visitors on a website, information of members of a shopping mall, etc.
A great deal of complicated code, which is difficult to maintain, is…Continue
Added by Lynn Guo on December 15, 2014 at 6:24pm — No Comments
Added by Vikas Kamra on December 1, 2014 at 4:30am — No Comments
Following problems will arise if you perform conditional filtering on text files in Java alone:
1. The text file is not a database,so it cannot be accessed by SQL. The code needs to be modified if filtering conditions are changed. Besides, if you want a flexible conditional filtering as that in SQL, you have to self-program the dynamic expression parsing and evaluating, resulting in a great amount of programming work.
2. Stepwise loading is required for the big files that…Continue
Added by Lynn Guo on November 23, 2014 at 6:00pm — No Comments
Java doesn’t support set operations directly, so nested loops have to be used to realize the operations of intersection, union, complement and etc. between text files. If there are many text files, or the file to be computed is too big to be loaded into the memory, or it is required to perform set operations according to multiple fields, the code will become even more complicated. However, with the assistance of esProc, which supports set operations…Continue
Added by Lynn Guo on November 13, 2014 at 6:00pm — No Comments