Subscribe to DSC Newsletter

June 2015 Blog Posts (14)

Time issue in creating a huge data frame from MongoDB collection

I am using R for building a predictive model. My data is stored in mongoDB collections. So i connected to MOngoDB using R to get the data and creating a dataframe in R by converting the BSON values to list and storing about 64567 records using a cursor, which took about 15 minutes. 

Its taking more time to create the dataframe everyday. Did anyone over here worked in similar situations? How did you…


Added by rupesh p on June 29, 2015 at 2:14pm — 2 Comments

6 Tips for Being an Awesome Data Scientist

In 2012, Harvard Business Review cited data scientist as the sexiest job of the 21st century. Just two months ago LinkedIn shared the “25 Hottest Skills that Got People Hired in 2014” – guess what type of workers possessed these skills? This attention has been followed with a slew of articles telling budding analysts the skills they’ll need to get to the top of the data scientist food…


Added by Elana Roth on June 29, 2015 at 3:00am — No Comments

Good Friday Reading

Here's the new selection of data science articles and resources, freshly posted and approved in the last 30 minutes.


Added by Vincent Granville on June 26, 2015 at 10:17am — No Comments

Data scientists are wasting their time

We all know that time is money, especially when you're paying a data scientist. But the New York Times reports that... 

"Data scientistsaccording to interviews and expert estimates, spend 50 percent to 80 percent of their time mired in [the] mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets."…


Added by Jennifer Methvin on June 25, 2015 at 4:30am — 4 Comments

Comparing MongoDB with MySQL

The comparison between MongoDB, the poster child of NoSQL, and MySQL has been raging for a while now. It is important that you know the difference between the two as this will assist you in making an informed decision.


The Major Differences between MongoDB and MySQL

1. There is a difference in the representation of data in the two databases. In…


Added by Jenny Richards on June 24, 2015 at 12:45am — 1 Comment

Model Deployment: the "Now What?" effect

Model Deployment

I have built a model that scores 99.9% accuracy! Great! Fantastic!

Now what?

This is what a colleague of mine calls the "Now what?" effect. After training, testing, and optimizing a model repeatedly, we get this fantastic performance on the evaluation set. Now it is the time to put your model to good use on real life, maybe streaming, data. This phase is called Model…


Added by Rosaria Silipo on June 23, 2015 at 1:28am — No Comments

Handcuffing Users with Packaged BI Dashboards?

Business intelligence solutions have come a long way in the past five years with continued innovation and transformation from traditional BI to data visualization and data discovery. With the advent of improved BI tools and accessibility, many businesses are using precious IT budgets to add self-serve business intelligence to their solution infrastructure and put the power of self-serve BI tools into the hands of their employees and business users.

Vendors like …


Added by Kartik Patel on June 23, 2015 at 12:00am — No Comments

Even without any "golden feature", multivariate modeling can work

A/B testing is widely used for online marketing, management of Internet ads or any other usual analytics. In general, people use it in order to look for "golden features (metrics)" that are vital points for growth hacking. To validate A/B testing, statistical hypothesis tests such as t-test are used and people are trying to find any metric with a significant effect across conditions. If you successfully find a metric with a significant difference between design A and B of a click button,…


Added by Takashi J. OZAKI on June 18, 2015 at 9:00am — No Comments

10 Common NLP Terms Explained for the Text Mining Novice

If you’re relatively new to the Natural Language Processing and Text Mining world, you’ll more than likely have come across some pretty technical terms and acronyms, that are challenging to get your head around, especially, if you’re relying on scientific definitions for a plain and simple explanation.

We decided to put together a list of 10 common terms in Natural Language Processing which we’ve broken down in layman terms, making them easier to understand. So if you don’t know your…


Added by Mike Waldron on June 17, 2015 at 4:38am — No Comments

R or Python, a practical problem

Which technology works best in a team when we are introducing data mining. The team has been using Excel as the data analysis tool, how can we apply/ run the data mining model (such as decision tree) on excel? I have been using R for a while and enjoy it very much. Good supply of fresh grad with training in R...However, when it comes to using and running the data mining model, R does a very poor job in execution. It is such a good tool when we develop and research patterns in a laboratory…


Added by Jeffrey Ng on June 16, 2015 at 11:14am — No Comments

Gauge Symmetry Group and Replication of Contingent Claims

Dear Colleagues,

I would like to bring to your attention my book of 1998 on my beliefs-preferences gauge symmetry that might be of interest to you:


[1]  V.A. Kholodnyi, Beliefs-Preferences Gauge Symmetry Group and Replication of Contingent Claims in a General Market Environment, IES Press, Research Triangle Park, North Carolina, 1998.


I introduced the beliefs-preferences gauge symmetry as well as the related differential-geometrical and…


Added by Valery A. Kholodnyi on June 11, 2015 at 2:00am — No Comments

Morning Analytic Coffee Blog, Vol. #3

Good Morning and Welcome to this edition of the Morning Analytic Coffee Blog.

Today, we talk about the understanding of project parameters, and being eager to please. One of the best and worst things for the analyst is project parameters: Written clearly, with a good understanding, project parameters can really help provide the structure for the results needed. Notice that I didn’t say “results desired…


Added by Richard D. Quodomine on June 9, 2015 at 10:30am — No Comments

Overfitting or generalized? Comparison of ML classifiers - a series of articles

In my own blog I wrote a series of articles about how major machine learning classifiers work, with some visualization of their decision boundaries on various datasets.


Added by Takashi J. OZAKI on June 5, 2015 at 4:00am — No Comments

Fraud detection in retail with graph analysis

Fraud detection is all about connecting the dots. We are going to see how to use graph analysis to identify stolen credit cards and fake identities. For the purpose of this article we have worked with Ralf Becher, Ralf is Qlik Luminary and he provides solutions to integrate the Graph approach into Business Intelligence solutions like QlikView and Qlik Sense.

Third party fraud in retail

Third party fraud occurs when a…


Added by Jean Villedieu on June 2, 2015 at 10:32am — 1 Comment

Blog Topics by Tags

Monthly Archives














On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service