Price

90

Teacher

93

Training programm

94

Tutorials

95

Summary rating from

**151**user's marks. You can set own marks for this article - just click on stars above and press "Accept".93

#### DATA SCIENCE Course Syllabus:

Basic Concepts of Statistics:

Descriptive Statistics and Probability Distributions:

**Introduction to Statistics**

- Different Types of Variables
- Measures of Central Tendency with examples
- Mean
- Mode
- Median
- Measures of Dispersion
- Range
- Variance
- Standard Deviation
- Probability & Distributions
- Probability Basics
- Binomial Distribution and its properties
- Poisson distribution and its properties
- Normal distribution and its properties

**Inferential Statistics and Testing of Hypothesis**

- Sample methods
- Sampling and types of sampling
- Definitions of Sample and Population
- Importance of sampling in real time
- Different methods of sampling
- Simple Random Sampling with replacement and without replacement
- Stratified Random Sampling
- Different methods of estimation
- Testing of Hypothesis & Tests
- Null Hypothesis and Alternate Hypothesis
- Level of Significance and P value
- t-test and its properties
- Chi-square test and its properties
- Z test
- Analysis of Variance
- F-test
- One and Two way
**ANOVA**

**Covariance & Correlation**

- Importance and Properties of Correlation
- Types of Correlation with examples

**Predictive Modeling Steps and Method with the Live example:**

- Data Preparation
- Variable Selection
- Transformation of the variables
- Normalization of the variables
- Exploratory Data analysis
- Summary Statistics
- Understanding the patterns of the data at single and many dimensions
- Missing data treatment using different methods
- Outlier’s identification and treating outliers
- Visualization of the data use Dimensional Types
- Bar chart, Histogram, Box plot, Scatter plot, Bubble chart, Word cloud etc…
- Model Development
- Selection of the sample data
- selecting the appropriate model based on the rule and data availability
- Model Validation
- Model Implementation
- Key Statistical parameters checking
- validating the model results with the actual result
- Model Implementation
- implementing the model for future prediction
- Real time telecom business use case with detail explanation
- Introducing a couple of real time use cases.

** Supervised Techniques:**

- Many linear Regressions
- Linear Regression – Introduction – Applications
- Assumptions of Linear Regression
- Building Linear Regression Model
- Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global hypothesis etc)
- Validation of Linear Regression Models (Re running Vs. Scoring)
- Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc)
- Interpretation of Results – Business Validation – Implementation on new data
- Real time case Manufacturing and Telecom Industry revenue using the models
- Logistic Regression
- Logistic Regression – Introduction – Applications
- Linear Regression vs. Logistic Regression vs. Generalized Linear Models
- Building Logistic Regression Model
- Standard model metrics (Concordance, Variable significance, Hosmer Lemeshow Test, Gini, KS, Misclassification etc)
- Validation of Logistic Regression Models (Re running Vs. Scoring)
- Standard Business Outputs (Decile Analysis, ROC Curve)
- Probability Cut-offs, Lift charts, Model equation, drivers etc)
- Interpretation of Results – Business Validation – Implementation on new data
- Real time case study to predict the Churn customers in the Banking and Retail industry
- Partial Least Square Regression
- Partial Least Square Regression – Introduction – Applications
- Difference between Linear Regression and Partial Least Square Regression
- Building PLS Model
- Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global hypothesis etc)
- Interpretation of Results – Business Validation – Implementation on new data
- sharing the real time example to identify the key factors which are driving the Revenue

**Variable Reduction Techniques**

- Factor Analysis
- Principle component analysis
- Assumptions of PCA
- Working Mechanism of PCA
- Types of Rotations
- Standardization
- Positives and Negatives of PCA

**Supervised Techniques Classification:**

- CHAID
- CART
- Difference between CHAID and CART
- Random Forest
- Decision tree vs. Random Forest
- Data Preparation
- Missing data imputation
- Outlier detection
- handling imbalance data
- Random Record selection
- Random Forest R parameters
- Random Variable selection
- Optimal number of variables selection
- Calculating out Of Bag (OOB) error rate
- Calculating Out of Bag Predictions
- A couple of Real time uses cases which related to Telecom and Retail Industry. Identify of the Churn.

**Unsupervised Techniques:**

- Segmentation for Marketing Analysis
- Need for segmentation
- Criterion of segmentation
- Types of distances
- Clustering algorithms
- Hierarchical clustering
- K-means clustering
- Deciding number of clusters
- Case study
- Business Rules Criteria
- Real time use case to identify the Most Valuable revenue generating Customers.

**Time series Analysis:**

- Forecasting – Introduction – Applications
- Time Series Components (Trend, Seasonality, Cyclicity, and Level) and Decomposition
- Basic Techniques –
- Averages,
- Smoothening
- Advanced Techniques
- AR Models,
- ARIMA
- UCM
- Hybrid Model
- Understanding Forecasting Accuracy – MAPE, MAD, MSE etc
- Couple of use cases, to forecast the future sales of products

**Text Analytics:**

- Gathering text data from the web and other sources
- Processing raw web data
- Collecting Twitter data with Twitter API
- Naive Bayes Algorithm
- Assumptions and of Naïve Bayes
- Processing of Text data
- Handling Standard and Text data
- Building Naïve Bayes Model
- Understanding standard model metrics
- Validation of the Models (Re running Vs. Scoring)
- Sentiment analysis
- Goal Setting
- Text Preprocessing
- Parsing the content
- Text refinement
- Analysis and Scoring
- Use case of Health care industry, identify the extracting the data from the TWITTER.

**Visualization Using Tableau:**

- Live connectivity from R to Tableau
- Generating the Reports and Charts

**R PROGRAMMING**

**SESSION 1: Getting Started with R**

- What is statistical programming?
- The R package
- Installation of R
- The R command line
- Function calls, symbols, and assignment
- Packages
- Getting help on R
- Basic features of R
- Calculating with R

**SESSION 2: Matrices, Array, Lists, and Data Frames**

- Character vectors
- Operations on the logical vectors
- Creating the matrices and operations on it
- Creating the array and operations on it
- Creating the lists and operations on it
- Making data frames
- Working with data frames

**SESSION3: Getting Data in and out of R**

- Importing Data into R
- Exporting Data in R
- Copy Data from Excel to R
- Loading and Saving Data with R
- Importing different types of file formats

**SESSION4: Data Manipulation and Exploration:**

- Variable transformations
- Creating Dummy variables
- Data set options (Rename, Label)
- Keep / Drop Columns
- Identification and Dealing with the Missing data
- Sorting the data
- Handling the Duplicates
- Joining and Merging (Inner, Left, Right and Cross Join)
- Calculating Descriptive Statistics
- Summarize numeric variables
- Summarize factor variables
- Transpose Data
- Aggregated functions using Group by
- Dplyr and data table packages for the data manipulation
- Data preparation using the sqldf package

**SESSION5: Conditional Statements and Loops:**

- If Else
- Nested If Else
- For Loop
- While Loop

**SESSION6: Functions:**

- Character Functions
- Numeric Functions
- Apply Function on Rows
- Converting a factor to integer
- Indexing Operators in List

**SESSION7: Graphical procedures**

- Pie chart
- Bar Chart
- Box plot
- Scatter plot
- Multi Scatter plot
- Word cloud etc.…

** SESSION8: Advanced R and Real time analytics examples:**

- Data extraction from the Twitter
- Text Data handling
- Positive and Negative word cloud
- Required packages for the analytics
- Sentiment analysis using the real time example
- R code automation
- Time series analysis with the real time Telecom data

Couple of examples with the