Speaker : Rafal Lukawiecki
Data mining is about exploring and finding correlations between data. It also can be used to do predictions and to find patterns. But, predictions does not mean predicting the future. Predicting the future means making strong assumptions that nothing will change around you.
Predictive analytics is understanding the customers and building effective marketing campaigns.
In order to do data mining, the data must have some structure, having attributes, flags, etc. But, you have to flatten the data or de-normalize the data structures, which means potentially a lot of data with a lot of different columns.
As an output, there are analysis, such as a risk of fraud or happiness. Another output can be just clusters or groups.
3 steps are necessary, defining the model (input and output), train the model, and validating the results that is likely the most important.
From the data, the data mining engine feeds a mining model.
On the backend, SQL Server with Analysis Services are required, starting with the version 2008. Starting 2012, SSAS comes in two flavor : multidimensional and tabular. But for data mining, no cube is needed.
On the frontend, only Excel is needed plus the free Data Mining add-Ins. The data for the Data Mining Add-Ins must reside in the Excel sheet. SQL Server Data Tool might be used to manage data mining projects. Additionally, SQL Server Management Studio may be helpful as well.
For model validation and statistics, R is the reference (http://cran.r-project.org/), bringing additional statistics tools no available in Excel or SQL.
An excellent presentation with an excellent enthusiastic speaker !
The opinions expressed herein are my own personal opinions and do not represent
my employer's view in any way.