Let’s learn how to get data in and look at it.We’ll need to remember a few things about pandas. First, pandas is a library for data analysis. The powerful tool of pandas is the data frame, a tabular data structure with labeled rows and columns.
As an example, we’ll use a data frame with Boston data.(here)The rows are labeled by a special data structure called an index.Indexes in pandas are tabled. Lists of labels that permit fast look up and some powerful relational operations.The index labels in the Boston Dataframe are unnamed.Labeled rows and columns improve the clarity and intuition of many data analysis tasks.
When we ask for the type of the Boston , it’s a data frame.
When we ask for its shape, it has 506 rows and 15 columns.
The Dataframe columns attribute gives the names of its Unnamed, crim, zn, indus, chas, nox , rm , age, dis, rad, tax, ptratio, black, lstat, medv. Notice the boston columns attribute is also a pandas index.
Data frames can be sliced like NUM PY arrays or Python lists using colons to specify the start end and the stride of a splice.
First, we can slice from the start of the dataframe to the 5th row non inclusive using the iloc accessor to express the slice positionally.
2nd, we can slice from the 5th last row to the end of the dataframe using a negative index.
There’s another way to see just the top rows of the dataframe, the head method specifying head five returns the first 5 rows. Specifying head two returns just the first 2 rows.
The head method is particularly useful because our data frame here has over 500 rows.
The opposite of head is tail. Specifying tail without an argument returns the last five rows. By default. Specifying tail three returns the last three rows.Again, tail gives a useful summary of large data frames.
Another useful summary method is info Info returns other useful summary information, including the kind of index, the column labels, the number of rows and columns, and the data type of each column.
The columns of a dataframe themselves are a specialized data structure called a series, extracting a single column from a dataframe returns a series.
Notice. The series extracted has its own head method and inherits its name attribute from the Dataframe column.
To extract the numerical entries from the series.Use the values attribute. The data in this series actually form a NUM py array, which is what the values attribute actually yields.
A pandas series then is a 1 dimensional labeled numpy array and a dataframe is a 2 dimensional labeled array whose columns are series.We’ve seen a few concepts extending what we already knew, including head, Tail info, index values, and series.Take some time to practice using these concepts, here you can find the link to notebook and boston.csv file on Github.