What is CRISP-DM?
Using process ad nauseam?
What does it contain?
It has 2 major parts i.e firstly, the Reference Guide, which describes the generic phases and tasks. Secondly, a User Guide that has checklists, questionnaires, tools and techniques etc. that help in the actual project work.
Major phases of CRISP-DM
I would like to describe the phases of CRISP-DM with some commentary from client projects that my cohort has executed at the Data School.
Business Understanding: Determine what type of problem that are we solving in technical terms( data classification). Understand the success criteria of the project as defined by the customer. A project plan is a requirement. As part of client project weeks, we have used client interviews, questionnaires, initial review of documentation and data to understand the business problem. This helped us to understand the business problem and also determine the success criteria for the project
Data understanding: Data extraction, Data exploration including statistical analysis to understand the data. In a client project context, understanding/building a partial data model, extracting the data in a tool like Alteryx and using its Data profiling/Data investigation tools are few ways to achieve a good understanding of the data. One can also quickly plot some visualisations in Tableau to understand the data better.
Modelling: The data modelling phase consists of selecting the modelling technique, building the test case and the model. This is specific to the type of problem. As an example, in predictive analytics projects, this leads to building the model, creating the test set and deciding the evaluation criteria such as precision, accuracy or recall. Even in an analytics project, this could translate into building initial reports/visualization and matching them with existing reports to ensure that parameters, calculations and measures are right.
Evaluation: In the evaluation phase the results are checked against the defined business objectives. Some aspects are also covered in live projects with the practice of client demos for early validation and interpretation of results in business terms. Modelling and Evaluation is an iterative process.
Deployment: The deployment phase is described generally in the user guide. It could be a final report or a software component. The user guide describes that the deployment phase consists of planning the deployment, monitoring and maintenance. In a typical project, we could utilise the process when we finally build the components, documentation for handover to the client’s ops/data team.
Conclusion
CRISP-DM is a useful tool that can be an asset in a data analyst’s toolkit. As above, I have indicated the various areas it can be applied. It can help not only data analysts (beginner or intermediate skills) but also experienced practitioners avoid omnipresent data rabbit holes.
Reference: