A Data Science Central Community
R is widely used among scientists and statisticians to perform statistical analysis while Salesforce.com is one of the leading CRM software packages used for Marketing and Salesforce automation. Salesforce.com contains vital information regarding Leads, Customers, Contacts, Opportunities and Cases. Currently this data is mainly used for operational purposes by Sales and Marketing professionals.
How about using Salesforce CRM data for predictive analysis or…
Added by Pradip Banerjee on August 3, 2016 at 8:00pm — No Comments
As R programming language becoming popular more and more among data science group, industries, researchers, companies embracing R, going forward I will be writing posts on learning Data science using R. The tutorial course will include topics on data types of R, handling data using R, probability theory, Machine Learning, Supervised – unSupervised learning, Data Visualization using R, etc. Before going further, let’s just see some stats and tidbits on data science and…Continue
This post brings forth to the audience, few glimpses (strictly) of insights that…
Added by Dr. Pradeep Mavuluri on December 16, 2015 at 3:02am — No Comments
Dear R programmers,
In this May (2015), our favorite "R" almost came to 12th position in the popular TIOBE Programming Community Index (TIOBE Index), however, it is experienced some volatility after that and couldn't move further to top 10. Currently, it holds 19th rank (for this month); wishing it retains its position in top 20 through the rest of the months of the year and hope to move…Continue
Linear regression is one of the first things you should try if you’re modeling a linear relationship (actually, non-linear relationships too!). It’s fairly simple, and probably the first thing to learn when tackling machine learning.
At first, linear regression shows up just as a simple equation for a line. In machine learning, the weights are usually represented by a vector θ (in statistics they’re often represented…Continue
When you're cleaning up data, you usually end up using a 5-8 functions a ton of times, and then a few more once or twice. Here are those 5-8 functions I find myself using again and again.
Here is a quick overview:
names() - returns the column names of a dateset…Continue
Added by Alex Woods on July 4, 2015 at 8:00am — No Comments
Its taking more time to create the dataframe everyday. Did anyone over here worked in similar situations? How did you…
Added by suresh kumar Gorakala on April 20, 2015 at 3:30am — No Comments
Cross-row and group computation often involves computing link relative ratio and year-on-year comparison. Link relative ratio refers to comparison between the current data and data of the previous period. Generally, it takes month as the time interval. For example, compare the sales amount of April with that of March, and the growth rate we get is the link relative ratio of April. Hour, day, week and quarter can also be used as the time…Continue
Added by Jessica May on September 22, 2014 at 2:00am — No Comments
It is common to use R language to group and summarize data of files. Sometimes we may find ourselves processing comparatively big files which have smaller computed result and bigger source data. We cannot load them wholly to the memory when we need to compute them. The only solutions could be batch importing and computing as well as result merging. We’ll use an example in the following to illustrate the way of R language to group and summarize data from big text files.
Here is a file,…Continue
Both esProc and R language are typical data processing and analysis languages with two-dimension…Continue
Added by Jessica May on July 21, 2014 at 1:36am — No Comments
In our day to day life, we come across a large number of Recommendation engines like Facebook Recommendation Engine for Friends’ suggestions, and suggestions of similar Like Pages, Youtube recommendation engine suggesting videos similar to our previous searches/preferences. In today’s blog post I will explain how to build a basic recommender System.…
Added by suresh kumar Gorakala on June 5, 2014 at 10:55pm — No Comments
In today’s blog post, we shall look into time series analysis using R package – forecast. Objective of the post will be explaining the different methods available in forecast package which can be applied while dealing with time series analysis/forecasting.
A time series is a collection of observations of well-defined data items…Continue
Ever since I’ve started working on R , I always wondered how I can present the results of my statistical models as web applications. After doing some research over the internet I’ve come across ShinyR – a new package
from RStudio which can be used to develop interactive web applications with R.
Before going into how to build web apps using R, let me give you some overview about ShinyR.
Added by suresh kumar Gorakala on March 23, 2014 at 1:30am — No Comments
Both R & Python should be measured based on their effectiveness in advanced analytics & data science. Initially, as a new comer in data science field we spend good amount of time to understand the pros and cons of these two. I too carried out this study solely for “self” to decide which tool should i pick to get in depth of data science. Eventually, i have started realizing that both (R & Python) has its space of mastery along with their broad support to data science. Here some…Continue
Added by Manish Bhoge on February 7, 2014 at 11:22pm — No Comments
Statistics.com, a provider of online education in statistics and analytics, announces a partnership with CrowdANALYTIX, a predictive modeling “managed crowdsourcing” company, offering a new online course, “Applied Predictive Analytics in partnership with CrowdANALYTIX“, which will run from Oct. 11 to Nov 8, 2013.
The goal of this course is to teach users (who have basic knowledge of R programming, predictive analytics…Continue
Added by Janet Dobbins on September 11, 2013 at 10:59am — No Comments
In my previous post, we saw that R-squared can lead to a misleading interpretation of the quality of our regression fit, in terms of prediction power. One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfittting.
1.Do-it-yourself leave-one-out cross validation in R.
In this type…Continue
R squared, also known as coefficient of determination, is a popular measure of quality of fit in regression. However, it does not offer any significant insights into how well our regression model can predict future values. Instead, the PRESS statistic (the predicted residual sum of squares) can be used as a measure of predictive power. The PRESS statistic can be computed in the leave-one-out cross validation…Continue
Hadoop (MapReduce where code is turned into map and reduce jobs, and Hadoop runs the jobs) is the most well known technology used for "Big Data" because it allows an organization to store huge quantities of data at very…Continue
Added by Michael Walker on November 7, 2012 at 5:55pm — No Comments
MSc Applied Econometrics
MSc Econometrics is the New Degree level program which is offered Online. It is one…