It is used to predict a quantitative outcome p on the basis of one single predictor variable q. The main goal of simple linear regression is to build a mathematical model (or formula) which defines p as a function of the q variable.

Once, we built a mathematical or statistically significant model, it’s possible to use it for predicting future outcome on the basis of new q values.

Consider that, we want to analysis the impact of advertising budgets of five medias (YouTube, Facebook, Instagram, twitter and newspaper) on future sales. 

Formula and basics

The mathematical formula of the linear regression can be written as p = b0 + b1*q + e, where:

  • b0 and b1: It is  known as the regression beta coefficients or parameters:
  • b0 : intercept of the regression line
  • b1 :  slope of the regression line
  • e is the error term 
  • The figure below illustrates the linear regression model, where:
  • best-fit regression line is in red
  • intercept (b0) and the slope (b1) are shown in black
  • error terms (e) that away form the best-fit regression line
Related image

From the scatter plot above, it can be seen that not all the data points fall exactly on the fitted regression line. Some points are above or top of the red line and some are below it; overall, the residual errors (e) have approximately mean zero.

Residual sum of Squares : sum of the squares of the residual errors

Residual Standard Error: average variation of points around the fitted regression

This is one of the important metrics which used to evaluate the overall quality of the fitted regression model. The lower the Residual Standard Error, the more accuracy more.

Since the mean error term is 0, the outcome variable p can be approximately estimated as follow:

  p ~ b0 + b1*q

There is a trend to learn Data Science using R Training in Mumbai/ With this blog our instructor has given fair idea, about Text Mining in Data Science training. We have always encouraged our students to Learn R Practically.

“Text mining is an element in data science that allows us to highlight the most frequently used keywords in a given data.

It is also referred to as text cloud or tag cloud, which is used for a visual representation of text data. Word cloud Size is related with frequency if frequency is more, then word size will be bigger otherwise word size will be smaller size.

Creating word clouds is very simple in R if you know the Procedure or steps to execute. The text mining ™ and the word cloud generator (word cloud) packages are available in R for helping us to analyze texts and to quickly visualize the keywords as a word cloud.

Why industry prefers word Cloud to present data?

·     It is adding simplicity and clarity in the process.

·     Frequently accessed keywords stand out better in a word cloud.

·     Easy to understand and visually engaging than a table data

Applications of word cloud?

  • Researchers
  • Marketing Decisions
  • Politicians and Journalists
  • Cyber Crime
  • Many More

Let’s understand quickly with example:

John eats mango. John plays football. Nadal eats mango. Nadal plays tennis.

Step 1: – Finding master set list of word.

E.g. John, eats, mango, plays, football, nadal, tennis.

This is all Master set of word.

Step 2: – Tagging presence.

StatementJohnEatsMangoPlaysFootballNadalTennis
State11110000
State21001100
State30110010
State40001011

Step 3: – Frequency Calculations i.e. column wise calculations

JohnEatsMangoPlaysFootballNadalTennis
2222121

Step 4: – Create Word Cloud to visualize your data and help management to take the right decision.

Learn more about Word Cloud and Text Mining in our Data Science course in Mumbai.