Subscribe to DSC Newsletter

All Blog Posts Tagged 'data' (134)

Data Science with R

As R programming language becoming popular more and more among data science group, industries, researchers, companies embracing R, going forward I will be writing posts on learning Data science using R. The tutorial course will include topics on data types of R, handling data using R, probability theory, Machine Learning, Supervised – unSupervised learning, Data Visualization using R, etc. Before going further, let’s just see some stats and tidbits on data science and…

Continue

Added by suresh kumar Gorakala on December 29, 2015 at 9:30am — 1 Comment

Know Your Data: The Data Complexity Matrix

Data environments are growing exponentially. IDC reports that compound annual growth in data through 2020 will be almost 50% per year. Not only is there more data, but there are more data sources. According to Ventana Research, 70% of organizations need to integrate more than 6 disparate data sources. At the same time, the value of unlocking that data and using it to make business decisions is also increasing. For the business user, understanding this complex data and unlocking its potential…

Continue

Added by Eran Levy on November 24, 2015 at 12:00pm — No Comments

Cross Industry Standard for Data Mining

Recently I have come across a term, CRISP-DM - a data mining standard. Though this process is not a new one but I felt every analyst should know about commonly used Industry wide process. In this post I will explain about different phases involved in creating a data mining solution. 



CRISP-DM, an acronym for Cross Industry Standard Process for Data Mining, is a data mining process model that includes commonly used approaches that data…

Continue

Added by suresh kumar Gorakala on October 22, 2015 at 10:59am — 2 Comments

Data Systems in India - Case for an Audacious Overhaul

This blog written by Adrash Matthew is an outcome of a discussion on the current state of data systems in India and the case for an audacious overhaul in light of ‘Digital India’ initiative with Mr TCA Srinivasa Ragavan. Mr Raghavan is a Senior Associate Editor in the Hindu Business Line. He has been a consultant to the Reserve Bank of India's history project, advisor to the Director of ICRIER. He is also a Distinguished Fellow of the Institute of Peace and Conflict…

Continue

Added by Athena Infonomics on October 17, 2015 at 2:00am — No Comments

How to track and visualize data lineage

Data lineage is about tracking the flow of information. It is necessary to guarantee the quality, usability and security of your data. For large organizations, it is also a key conformity requirement. With Linkurious, it is possible to use a graph-based approach to solve these challenges.

What is data lineage?

The success of an organization depends on the quality, usability and security of its data. Want to provide amazing support to your customer? Create new products…

Continue

Added by Jean Villedieu on October 13, 2015 at 1:47am — No Comments

How NASA experiments with knowledge discovery

NASA is using big data to make complex knowledge more readily available. Learn how graph visualization can help turn large corpus of documents into concrete insights.

nasa-hero

A database of lessons learned

Even in a mature and knowledge-driven organization like NASA, finding an answer to a common business issue can be frustrating. Past surveys at NASA have shown that most people have trouble finding the…

Continue

Added by Jean Villedieu on August 31, 2015 at 7:14am — No Comments

Find out how the Onam celebrations at Athena Infonomics turned into a lesson in analytics by our data science geeks



It was a customary Onam in office with a colourfully arranged Onam Pookalam adorning the entrance. Until Dr. Ram walked in, took one look at the pookalam and claimed he could see a schematic expansion series of squares, circles and triangles embedded in concentric circles!
                   Dr.Ram explaining the schematic expansion serious…
Continue

Added by Athena Infonomics on August 31, 2015 at 12:00am — No Comments

Can Big Data solve the skill vs. luck 'mystery' of fantasy sports?

Daily Fantasy Sports (DFS) is a relatively new phenomenon. Some say it's definitely a game of skill, others claim it's just another form of gambling and shouldn't be legal.

DFS, ''the next big thing'' is taking North America by storm and slowly knocking on Europe's doors. The way it works is simple: sports lovers select a team of real world athletes…

Continue

Added by Jure Rejec on July 28, 2015 at 7:00am — No Comments

Is opensource enough? The Need for an Open Architecture Analytics

The job of a data analyst nowadays has become very extensive, in its need to cover a number of different and ever-changing tasks.

A data analyst must query a variety of internal and external data sources, each with a different access protocol and format; integrate these data with results from REST and web services queried over the Internet, such as Google API or any social media channel; exchange information with business analysts, who, while lacking the deep mathematical background,…

Continue

Added by Rosaria Silipo on July 14, 2015 at 2:27am — No Comments

Random Forest in Python

Random Forest is a machine learning algorithm used for classification, regression, and feature selection. It's an ensemble technique, meaning it combines the output of one weaker technique in order to get a stronger result.

The weaker technique in this case is a decision tree. Decision trees work by splitting the and re-splitting the data by…

Continue

Added by Alex Woods on July 4, 2015 at 8:30am — No Comments

6 Tips for Being an Awesome Data Scientist

In 2012, Harvard Business Review cited data scientist as the sexiest job of the 21st century. Just two months ago LinkedIn shared the “25 Hottest Skills that Got People Hired in 2014” – guess what type of workers possessed these skills? This attention has been followed with a slew of articles telling budding analysts the skills they’ll need to get to the top of the data scientist food…

Continue

Added by Elana Roth on June 29, 2015 at 3:00am — No Comments

Data scientists are wasting their time

We all know that time is money, especially when you're paying a data scientist. But the New York Times reports that... 

"Data scientistsaccording to interviews and expert estimates, spend 50 percent to 80 percent of their time mired in [the] mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets."…

Continue

Added by Jennifer Methvin on June 25, 2015 at 4:30am — 4 Comments

Accessing Big Data with KNIME

Continue

Added by Rosaria Silipo on May 6, 2015 at 12:30am — No Comments

Analytics: No Pain, No Gain

      

"Analytics is a journey and not a destination!! It takes considerable effort to frame that journey and execute it with a sense of purpose. You will encounter stumbling blocks that may threaten your initiative but you need to find a way out and keep marching ahead."

We did an analytics exercise for a US client recently in education domain that had all the flavors of roadblocks…

Continue

Added by Rohit Pandey on April 30, 2015 at 12:29am — 2 Comments

Bigdata and Website Analysis

Why use big data tools to analyse web analytics data?

     Because, Web event data is incredibly valuable. It tells us how our customers actually behave (in lots of detail), and how that varies. We can also do analysis between different customers or for the same customers over time.…

Continue

Added by VINU KIRAN .S on January 31, 2015 at 5:19am — No Comments

esProc Helps Process Structured Text in Java - Import data into the database

While importing the structured text files into the database using Java alone, we need to combine the SQL statements together manually, and to deal with various troublesome situations as well, like if the data in a table has been existed, whether we should update it or insert data into it, if some fields are included in the file, and if the fields in the file are consistent with those in the table.

 

As esProc participates in Java programming, these problems can be solved…

Continue

Added by Lynn Guo on December 22, 2014 at 7:04pm — No Comments

esProc Helps Process Structured Texts in Java – Handle Big Files in Groups

There is a type of text files that they are too big to be entirely loaded into the memory, yet as the data have been sorted by a certain column and if they are imported in groups according to this column, they can be all put into the memory for computing. These text files include the call detail record of a telecom company, statistics of visitors on a website, information of members of a shopping mall, etc.

 

A great deal of complicated code, which is difficult to maintain, is…

Continue

Added by Lynn Guo on December 15, 2014 at 6:24pm — No Comments

Data Scientist, The Magician?

Isn't it true? Isn't this what most of the business folks and CXO's expect from their data science

teams? Yes in fact, this is what we have been told in nearly all of conferences we attend. Thumping case studies, feverish pitch makes one believe in the story and you walk out with a true sense of achieving same…
Continue

Added by Vikas Kamra on December 1, 2014 at 4:30am — No Comments

Processing Structured Text in Java–Conditional Filtering

Following problems will arise if you perform conditional filtering on text files in Java alone: 

1. The text file is not a database,so it cannot be accessed by SQL. The code needs to be modified if filtering conditions are changed. Besides, if you want a flexible conditional filtering as that in SQL, you have to self-program the dynamic expression parsing and evaluating, resulting in a great amount of programming work.

2. Stepwise loading is required for the big files that…

Continue

Added by Lynn Guo on November 23, 2014 at 6:00pm — No Comments

esProc Helps Process Structured Texts in Java – Set Operations

Java doesn’t support set operations directly, so nested loops have to be used to realize the operations of intersection, union, complement and etc. between text files. If there are many text files, or the file to be computed is too big to be loaded into the memory, or it is required to perform set operations according to multiple fields, the code will become even more complicated. However, with the assistance of esProc, which supports set operations…

Continue

Added by Lynn Guo on November 13, 2014 at 6:00pm — No Comments

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

On Data Science Central

© 2019   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service