Data Analysis Life Cycle

Data analysis is the process of collecting data and then analyzing that data to extract meaningful insights so that those insights can help the stakeholders in taking necessary decisions and solving problems.

Data analysis life cycle consists of six phases in total.

  • Ask
  • Prepare
  • Process
  • Analyze
  • Share
  • Act

Ask

You start the process of data analysis just like how you start the process of solving any problem in your daily life i.e. by asking yourself what is the ‘To Find’ section.

Photo by Daria Nepriakhina 🇺🇦 on Unsplash

By pinpointing what is asked of you, you can gather only the relevant data later. It’s like you have a sea of data at your behest but you need only a bucket of it for the job, so in the ‘Ask’ phase you define what data you need to be put into that bucket.

Photo by Brooks Rice on Unsplash

You take the stakeholders on board and clearly define the questions that need to be answered. This is the foundation of data analysis; if this goes wrong, the whole building of analysis comes crumbling down later. Therefore, the analyst should be clear, complete, concrete, and concise in his communication with the stakeholders.

Prepare

In the ‘Preparing’ stage, we gather, store, and organize the data that is needed to solve the questions that were defined in the ‘Ask’ phase. The data could be taken from an established data warehouse, from a separate database or new data can be created as per the needs.

Photo by Mika Baumeister on Unsplash

Then that data is stored and decisions are taken about what form it should be stored in and what kind of privacy it needs to have. Should the data be stored in some external storage device, on an internal storage device, or cloud. Lastly, the data is organized in a form that is best to use for the posed questions.

Process

The data that was gathered, stored, and organized in the ‘Prepare’ stage is brought to the ‘Process’ stage so that it can be cleaned. Data is bias prone, a person who is recording or collecting data may not realize it but their personal biases or demographic limitations can cause the data to be skewed. These biases are managed in the process stage.

Photo by Claudio Schwarz on Unsplash

Null values are another issue that is fixed in the Process stage. Often some attributes are not recorded/answered while collecting data and that observation is given a null value. These null values are assigned some numeric value that makes it possible to include them in the calculations without impacting the result.

Outliers

In addition to all this, ‘outliers’ are another discrepancy that is dealt with in the process phase. Outliers are anomalies that occur because of certain factors. It could be abnormal conditions, a series of coincidences, or some instrument malfunction that results in those anomalies. It is a good practice to take care of them before analyzing the data so that the analysis is accurate.

Analyze

In the ‘Analyze’ phase the cleaned data is put into formulas and equations so that calculations can be performed. Spreadsheets or SQL are used to perform calculations. You can use any language that you’re comfortable with for this phase.

Photo by Chris Liverani on Unsplash

One can use Python, R, SQL, or any other language that has the packages to do calculations. This is a very crucial phase as it is the basis of all the visualizations that you’re going to do later on. Therefore if you use a wrong formula or use the wrong function then it can potentially make your whole analysis null and void.

Share

Good communication is the key to being a good data analyst. A data analyst has to be a good storyteller. It means that they should be able to effectively communicate their findings to the stakeholders who might not be from a technical background.

A good storyteller captivates their audience

Therefore a data analyst has to find a way to make those stakeholders see the trends in their findings and make them understand the gravity of those findings. For that visualization is used.

Photo by Luke Chesser on Unsplash

In the ‘Share’ phase bar charts, pie charts, graphs, heat maps, flowcharts, and other visualization means are used to transform facts into figures. Dashboards and reports are made to summarize all visualizations in one place. This phase is important as the decision power rests with the stakeholders and if they are not convinced by a compelling presentation of facts and figures then all the work an analyst has done becomes futile.

Photo by Campaign Creators on Unsplash

For visualization purposes, different tools and platforms can be used. Microsoft Excel, Power BI, Tableau, Python, and R can be used for visualization.

Act

‘Act’ is the last and final phase of the data analysis life cycle. In this phase, the stakeholders decide whether they are convinced by your findings and presentation or not.

Photo by Kaleidico on Unsplash

They may ask you to redo some analysis, they might point out any mistakes or if the analysis lacks anything, they might not be convinced at all and reject all of your propositions. This phase decides whether you need to convince the stakeholders more or are they convinced already.

About Khurram

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store