Review of Pandas DataFrame| Beginner Intro

Image for post
Image for post
Image source: edureka.co

Let’s learn how to get data in and look at it.We’ll need to remember a few things about pandas. First, pandas is a library for data analysis. The powerful tool of pandas is the data frame, a tabular data structure with labeled rows and columns.

Image for post
Image for post
Image source geeksforgeeks.org

As an example, we’ll use a data frame with Boston data.(here)The rows are labeled by a special data structure called an index.Indexes in pandas are tabled. Lists of labels that permit fast look up and some powerful relational operations.The index labels in the Boston Dataframe are unnamed.Labeled rows and columns improve the clarity and intuition of many data analysis tasks.

When we ask for the type of the Boston , it’s a data frame.

Image for post
Image for post

When we ask for its shape, it has 506 rows and 15 columns.

Image for post
Image for post

The Dataframe columns attribute gives the names of its Unnamed, crim, zn, indus, chas, nox , rm , age, dis, rad, tax, ptratio, black, lstat, medv. Notice the boston columns attribute is also a pandas index.

Data frames can be sliced like NUM PY arrays or Python lists using colons to specify the start end and the stride of a splice.

First, we can slice from the start of the dataframe to the 5th row non inclusive using the iloc accessor to express the slice positionally.

Image for post
Image for post

2nd, we can slice from the 5th last row to the end of the dataframe using a negative index.

Image for post
Image for post

There’s another way to see just the top rows of the dataframe, the head method specifying head five returns the first 5 rows. Specifying head two returns just the first 2 rows.

Image for post
Image for post
Image for post
Image for post

The head method is particularly useful because our data frame here has over 500 rows.

The opposite of head is tail. Specifying tail without an argument returns the last five rows. By default. Specifying tail three returns the last three rows.Again, tail gives a useful summary of large data frames.

Image for post
Image for post

Another useful summary method is info Info returns other useful summary information, including the kind of index, the column labels, the number of rows and columns, and the data type of each column.

Image for post
Image for post

The columns of a dataframe themselves are a specialized data structure called a series, extracting a single column from a dataframe returns a series.

Image for post
Image for post

Notice. The series extracted has its own head method and inherits its name attribute from the Dataframe column.

Image for post
Image for post

To extract the numerical entries from the series.Use the values attribute. The data in this series actually form a NUM py array, which is what the values attribute actually yields.

Image for post
Image for post

A pandas series then is a 1 dimensional labeled numpy array and a dataframe is a 2 dimensional labeled array whose columns are series.We’ve seen a few concepts extending what we already knew, including head, Tail info, index values, and series.Take some time to practice using these concepts, here you can find the link to notebook and boston.csv file on Github.

Written by

I’m Data Science student.I love to create, learn and share my skills. learning a new technology, brushing up on current skills or writing Data Science articles.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store