Posted under » Python Data Analysis on 12 June 2023
From Pandas intro : series. While Series is like a column, a DataFrame is a whole table. The Series object behaves similarly to a NumPy array which we will learn later.
In simplest form
import pandas as pd data = { "calories": [420, 380, 390], "duration": [50, 40, 45] } myvar = pd.DataFrame(data) print(myvar) calories duration 0 420 50 1 380 40 2 390 45
Pandas use the loc attribute to return one or more specified row(s)
print(df.loc[0]) calories 420 duration 50 Name: 0, dtype: int64
Return 2 rows. When using [], the result is a Pandas DataFrame.
print(df.loc[[0, 1]]) calories duration 0 420 50 1 380 40
Learn how to update a dataframe.
With the index argument, you can name your own indexes.
import pandas as pd data = { "calories": [420, 380, 390], "duration": [50, 40, 45] } df = pd.DataFrame(data, index = ["day1", "day2", "day3"]) print(df) calories duration day1 420 50 day2 380 40 day3 390 45 print(df[['calories', 'duration']])
Both print(df) and print(df[['calories', 'duration']]) will return the same output. So if you just want to see the calories, then
print(df['duration']) day1 50 day2 40 day3 45 Name: duration, dtype: int64
If you want to save the contents to a text file or format this, then you can use a loop like a for and iterate statement
for index, row in df.iterrows(): print(index, ': ', row['duration'], file=open('output.txt', 'a')) day1 : 50 day2 : 39 day3 : 45
Another print example.
You can also achieve the same with for itertuples():
for row in df.itertuples(): print(index, ': ', row.duration)
Use the named index in the loc attribute to return the specified row(s).
print(df.loc["day2"]) calories 380 duration 40 Name: day2, dtype: int64
If there are more than one "day2" rows, then all the rows with day2 will be included.
You can assign the output into another array.
first = df.loc["day2"] second = df.loc["day3"] print(first, "\n\n\n", second) calories 380 duration 40 Name: day2, dtype: int64 calories 390 duration 45 Name: day3, dtype: int64
Just like iterrows above you can do the same with column. Loop or Iterate over all or certain columns of a dataframe
for column in df[['calories', 'duration']]: columnSeriesObj = df[column] print('Column Name : ', column) print('Column Contents : ', columnSeriesObj.values) Column Name : calories Column Contents : [420 380 390] Column Name : duration Column Contents : [50 40 45]
cont... Read data into Dataframe