6 steps to a successful data science project

Do you have a data science project ahead of you or are you simply interested in how data mining works? We explain the current and cross-industry guidelines for leading projects to success in a well-structured way — the CRISP-DM (Cross Industry Standard Process for Data Mining).

The method was developed in 1996 by well-known companies (Daimler AG, NCR Corporation, etc.) with the aim of establishing a uniform standard for projects. It structures the projects into six phases, where the model represents a cycle and the individual steps are not to be understood strictly hierarchically.

1. Business understanding (task definition)

The first step, Business Understanding, focuses on the problem or question. Which problem should be solved and how much potential does the project have? In this way, it should be considered how much financial resources can flow into the project. The easiest way to do this is to define the economic target criteria.

2.Data understanding (selection of relevant data sets)

What data sources do I have to achieve this goal? Is all necessary information available to me or do I have to obtain data first? 
Under certain circumstances, it may also be useful to reformulate the goal here.

3. Data preparation (data preparation)

Once the database has been compiled, it is time to view and prepare the data. This usually also results in one of the most complex parts, because the data usually has to be cleaned, transformed and prepared first.

4. Modelling (selection and application of data mining methods)

That's where the algorithm comes in. A data scientist creates a model and usually uses simple key figures to check whether the model is suitable for the calculations. At this point, it is often necessary to adjust a few things again and take a step back to data preparation.

5. Evaluation (evaluation and interpretation of events)

If the model is conclusive from a current perspective, it must be checked with regard to the target: Can the previously defined goals be achieved with this model? If necessary, the goal or model must be adapted accordingly.

6. Deployment (application of results)

If the evaluation has met the quality requirement, the model is now being implemented. At this point, a process for ongoing monitoring is usually also used to ensure whether the model still fits with the goals.


Magazin

Andere Beiträge

This is the thumbnail of the other blogpost.
Big data and data protection: cause for concern?

The challenge of new technologies is usually accompanied by uncertainties. Especially when it comes to collecting and using data, there is often talk of “monitoring” or...

Read more
This is the thumbnail of the other blogpost.
Data Analytics - How do I go about it correctly?

As soon as it comes to digitization and technologies, attractive-sounding buzzwords are quickly thrown around: it's immediately about artificial intelligence, data science, advanced analytics...

Read more