A Data Science Central Community
Machine learning automatically recognizes complex, previously unknown, novel, and useful patterns and information in all types of data. Data driven algorithms are the wave of the future and their results improve as the amount of data increases. Machine learning algorithms are used in search engines, image analysis, multimedia database retrieval, bioinformatics, industrial automation, speech recognition, and many other fields. This survey course covers the concepts and principles of a large variety of data mining methods, equips you with a working knowledge of these techniques and prepares you to apply them to real problems.
The statistical programming language R is used to implement machine learning algorithms. The instructor will introduce the R language by presenting examples from data exploration, graphics, and statistical analysis. The course covers supervised learning concepts, which require labeled training data. The supervised techniques include various types of linear regression, decision trees, k-nearest neighbors, Naive Bayes, support vector machines and ensemble methods. Unsupervised machine learning including clustering techniques and other advanced topics will be covered in separate follow-up courses.
Students will complete a data mining project using the supervised algorithms learned in class. You are expected to be moderately proficient in computer programming and to have an elementary level background in probability, statistics, linear algebra, and calculus. The R language will be used for class examples and homework assignments. Some prior knowledge of R is helpful. The instructor will provide additional reference study materials on math and statistics for those who need a refresher. These will not be reviewed in class.
Skills Needed: Moderate level of computer programming proficiency, elementary understanding of probability, statistics, linear algebra, and calculus.