Geek Culture
Published in

Geek Culture

450+ Practice Questions That Will Make You a Pandas, NumPy, and SQL Pro

A self-curated collection of practice questions to improve your Data Science Skills.

Photo by Cytonn Photography on Unsplash

Pandas, NumPy, and SQL undoubtedly sit at the core of every data science project. These tools are indispensable to the entire development life cycle of a data-driven project, making them an essential skill to possess to begin/maintain a career in data science.

A few tools utilized in a Data Science project (Image by author)

Given that these tools hold wide applicability in industry and academia due to their unparalleled potential, acquaintance with their functions and syntax has become an utmost necessity for aspiring data scientists.

Therefore, to scale your expertise, improve your acquaintance with three of the most popular tools in data science and challenge your existing knowledge, I am presenting a self-curated data science notebook with a collection of over 450 practice questions.

The motivation behind introducing this exercise is to strengthen your logical muscle and help you internalize data manipulation with three of the most sought-after tools in the data ecosystem.

You can get the practice notebook here. The steps to use the notebook are provided in the introduction of the notebook itself.

#1 Pandas

The Pandas library has become the go-to tool for data scientists for all sorts of tabular data analysis, management, and processing.

Pandas (Source: Pandas Website) (Edited by Author)

The extensive set of functions the Pandas API provides has always intrigued Data Scientists to do amazing things with it. You can read about the most frequently used methods in Pandas in my blog below:

To improve your skills and experience with handling tabular data, I have formulated over 200 questions in this exercise specifically for Pandas. These encompass a wide range of topics such as:

  • Input and Output Operations in Pandas
  • General functions on a DataFrame and a Series
  • Data manipulation
  • Filtering operations
  • GroupBy methods
  • Joins
  • Rolling window approaches
  • Data analysis, and many more.

Moreover, the exercise also includes a deep dive into exploring a real-world tabular dataset using Pandas, which will help you explore the applicability of various Pandas methods on a real-world dataset.

Sample Questions:

A couple of sample questions from the Pandas exercise are mentioned below:

  • Sort DataFrame based on another list
  • Swap two rows of a dataframe

#2 NumPy

NumPy (or Numeric Python) is widely used to efficiently process numerical calculations in Python.

NumPy (Source: NumPy Website) (Edited by Author)

NumPy is undoubtedly one of the most important libraries ever built in Python. Moreover, the whole data-driven ecosystem is in some way or the other dependent upon NumPy and its core functions.

If you have just started with NumPy, reading my blog below, which describes the most widely used methods in NumPy, can potentially help:

For the purpose of this data science exercise, I have created close to 150 questions specifically for NumPy. Topics include:

  • Numpy Array Creation Methods
  • NumPy Array Manipulation
  • Mathematical Operations on NumPy Arrays
  • Matrix and Vector Operations
  • Sorting Methods
  • Searching Methods
  • Statistical Methods, and more.

Sample Questions:

A couple of sample questions from the NumPy exercise are mentioned below:

  • Check whether all the elements of the array are finite or not
  • Find the sum ignoring the nan elements

#3 SQL

While Pandas and NumPy are popular Python-specific frameworks in the data ecosystem pertinent to data management and processing, SQL is an entire programming language of its own designed to interact with databases.

SQL (Image by author)

One thing that stands out in common between Pandas and SQL is that both are excellent tools for working on Tabular Data.

Since both Pandas and SQL are essentially used to handle and operate tabular data, similar operations can be performed using both. Read my blog below to explore a few of these conversions:

The SQL part of my data science exercise provides over 100 questions to improve your SQL skills. Topics I have covered in this part include:

  • Data manipulation
  • Data analysis
  • Table updates
  • GroupBy operations
  • SQL Joins
  • Filtering methods, and more.

Similar to the Pandas exercise, the SQL exercise also revolves around a deep dive into a real-world tabular dataset using methods in SQL.

Final Thoughts

Lately, while writing my own blogs and reading those written by other writers that are purely oriented around data science tips and tricks, I have realized that these blogs fundamentally assume the reader’s acquaintance with the foundational concepts underlying the proposed methodologies.

However, with time, I have realized that a significant portion of the readers may be left unaddressed because they find it challenging to comprehend the proposed tricks in these posts. This may be because of unfamiliarity or possibly because of getting overwhelmed with stuff they have never seen before.

I hope this exercise will serve as a great starting point for those seeking to gain confidence in using these popular data science tools and those seeking to improve and challenge their existing expertise.

Access the practice notebook here.

Thanks for reading.

Doge meme created by the author on imgflip.com.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Avi Chawla

Avi Chawla

3.3K Followers

Top Writer in AI | Become a Data Science PRO. Get the Data Science Mastery Notebook with 450+ Pandas, NumPy, and SQL questions: https://bit.ly/450notebook.