The Five steps of Data Science

Introduction to data science

Data Science Interview Guide

Overview of the five steps

Ask an interesting question

Obtain the data

Explore the data

Model the data

Communicate and visualize the results

Basic questions for data exploration

  • Is the data organized or not?
    We are checking for whether or not the data is presented in a row/column
    structure. For the most part, data will be presented in an organized fashion. In this book, over 90% of our examples will begin with organized data. Nevertheless, this is the most basic question that we can answer before diving any deeper into our analysis. A general rule of thumb is that if we have unorganized data, we want to transform it into a row/column structure. For example, earlier in this book, we looked at ways to transform the text into a row/column structure by counting the number of words/phrases.
  • What does each row represent?
    Once we have an answer to how the data is organized and are now looking
    at a nice row/column-based dataset, we should identify what each row
    actually represents. This step is usually very quick and can help put things
    in perspective much more quickly.
  • What does each column represent?
    We should identify each column by the level of data and whether or not it is quantitative/qualitative, and so on. This categorization might change as our analysis progresses, but it is important to begin this step as early as possible.
  • Are there any missing data points?
    Data isn’t perfect. Sometimes we might be missing data because of human
    or mechanical error. When this happens, we, as data scientists, must make
    decisions about how to deal with these discrepancies.
  • Do we need to perform any transformations on the columns?
    Depending on what level/type of data each column is at, we might need to
    perform certain types of transformations. For example, generally speaking, for the sake of statistical modeling and machine learning, we would like each column to be numerical. Of course, we will use Python to make any and all transformations. All the while, we are asking ourselves the overall question, what can we infer from the preliminary inferential statistics? We want to be able to understand our data a bit more than when we first found it.

Recap

--

--

--

Data Science Enthusiast, Remote Worker, Course Trainer, Archery Coach, Psychology and Philosophy Student

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

READ/DOWNLOAD=% Doing Data Science: Straight Talk

IMPACT Goes to AGU

How Does a Bike-Share Navigate Speedy Success?

5. linear regression

Data Processing in Data Science: Meaning, Stages, Types and More

Data Processing in Data Science

UNM Comprehensive Cancer Center Partners with RS21 to Launch Massive Integrated Informatics System

Debunking Fines Migration in Espresso

Transition from Pandas to Spark Koalas to rescue

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Desi Ratna Ningsih

Desi Ratna Ningsih

Data Science Enthusiast, Remote Worker, Course Trainer, Archery Coach, Psychology and Philosophy Student

More from Medium

What does data tell about Starbucks offers

Choosing a Summary Statistic to Explore Datasets (Part 1 - Univariate)

What is Data Science ?

Data Set & Data Type