A Data Science Central Community

Machine learning automatically recognizes complex, previously unknown, novel, and useful patterns and information in all types of data. Data driven algorithms are the wave of the future and their results improve as the amount of data increases. Machine learning algorithms are used in search engines, image analysis, multimedia database retrieval, bioinformatics, industrial automation, speech recognition, and many other fields. This survey course covers the concepts and principles of a large variety of data mining methods, equips you with a working knowledge of these techniques and prepares you to apply them to real problems.

The statistical programming language R is used to implement machine learning algorithms. The instructor will introduce the R language by presenting examples from data exploration, graphics, and statistical analysis. The course covers supervised learning concepts, which require labeled training data. The supervised techniques include various types of linear regression, decision trees, k-nearest neighbors, Naive Bayes, support vector machines and ensemble methods. Unsupervised machine learning including clustering techniques and other advanced topics will be covered in separate follow-up courses.

Students will complete a data mining project using the supervised algorithms learned in class. You are expected to be moderately proficient in computer programming and to have an elementary level background in probability, statistics, linear algebra, and calculus. The R language will be used for class examples and homework assignments. Some prior knowledge of R is helpful. The instructor will provide additional reference study materials on math and statistics for those who need a refresher. These will not be reviewed in class.

Topics include:

- Overview of R language
- Exploring data with R
- Various types of linear regression
- Decision trees
- K-nearest neighbors
- Naive Bayes
- Support vector machines
- Ensemble methods

Skills Needed: Moderate level of computer programming proficiency, elementary understanding of probability, statistics, linear algebra, and calculus.

Tags:

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions