DATA SCIENCE

Module 1 – Data Science Project Lifecycle

  • Recap of Demo
  • Introduction to Types of Analytics
  • Project life cycle

Module 2 – Introduction to Python, R and Basic Statistics

  • Installation of Python IDE
  • Anaconda and Spyder
  • Working with Python and some basic commands& Examples
  • Introduction to R and RStudio with some basics

Various graphical techniques to understand data

  • Bar plot
    • Histogram
    • Box plot
    • Scatter plot
  • The various Data Types namely continuous, discrete, categorical, count, qualitative, quantitative and its identification and application. Further classification of data in terms of Nominal, Ordinal, Interval and Ratio types
  • Random Variable and its definition
  • Probability and Probability Distribution – Continuous probability distribution / Probability density function and Discrete probability distribution / Probability mass function

Basic Statistics

  • Various sampling techniques 
  • Measure of central tendency
    • Mean / Average
    • Median
    • Mode
  • Measure of Dispersion
    • Variance
    • Standard Deviation
    • Range
  • Expected value of probability distribution
  • Measure of Skewness
  • Measure of Kurtosis
  • Normal Distribution
  • Standard Normal Distribution / Z distribution
  • Z scores and Z table
  • QQ Plot / Quantile-Quantile plot

Advanced Statistics

  • Sampling Variation
  • Central Limit Theorem
  • Sample size calculator
  • T-distribution / Student’s-t distribution
  • Confidence interval
    • Population parameter – Standard deviation known
    • Population parameter – Standard deviation unknown

Module 3 – Hypothesis Testing

Introduced to Hypothesis testing, various Hypothesis testing Statistics, understand what is Null Hypothesis, Alternative hypothesis and types of hypothesis testing.

  • Type I and Type II errors
  • ANOVA
  • Chi-Square test

High-Level overview of Machine Learning

  • Supervised Learning
    • Classifier
    • Regression
  • Unsupervised Learning
    • Clustering

Supervised – Classifiers

Module 4 – Machine Learning Classifiers – KNN

Module 5 – Classifier – Naive Bayes

Module 6 – Decision Tree

Module 7 – Logistic Regression

  • Simple Logistic Regression
  • Multiple Logistic Regression
  • Confusion matrix
    • False Positive, False Negative
    • True Positive, True Negative
    • Sensitivity, Recall, Specificity, F1
  • Receiver operating characteristics curve (ROC curve)

Module 8 – Bagging And Boosting

Module 9 – Black Box Methods

  • Network Topology
  • Support Vector Machines

Module 10 – Survival Analysis

  • Concept with a business case

Module 11 – Forecasting

  • ARMA (Auto-Regressive Moving Average), Order p and q
  • ARIMA (Auto-Regressive Integrated Moving Average), Order p, d and q

Supervised – Regression

Module 12 – Linear Regression

  • Scatter Diagram
  • Correlation Analysis
  • Principles of Regression
  • Ordinary least squares
  • Simple Linear Regression
  • Understanding Overfitting (Variance) vs Underfitting (Bias)
  • LINE assumption
    • Collinearity (Variance Inflation Factor)
    • Linearity
    • Normality
  • Multiple Linear Regression

Module 13 – Polynomial Regression

Module 14 – Decision Tree & Random Forest

Module 15 – Regularization Techniques

  • i).Lasso and Ridge Regressions

Module 16 – Multinomial Regression

  • Logit and Log Likelihood
    • Category Baselining
    • Modeling Nominal categorical data

Data Mining Unsupervised- Clustering

Module 17 – Data Mining Unsupervised – Clustering

  • HierarchialClustering / Agglomerative Clustering
  • K-Means Clustering

Module 18 – Dimension Reduction

  • Why dimension reduction
  • Advantages of PCA
  • Calculation of PCA weights
  • 2D Visualization using Principal components
  • Basics of Matrix algebra
  • SVD – Decomposition of matrix data

Module 19 – Data Mining Unsupervised – Network Analytics

  • Definition of a network (the LinkedIn analogy)
  • Introduction to Google Page Ranking

Module 20 – Data Mining Unsupervised – Association Rules

  • What is Market Basket / Affinity Analysis
  • Measure of association
    • Support
    • Confidence
    • Lift Ratio
  • Apriori Algorithm
  • Sequential Pattern Mining

Module 21 – Data Mining Unsupervised – Recommender System

Module 22 – Text Mining

Module 23 – Natural Language Processing

Assignments/Projects/Placement Support