Posted under » Python Data Analysis on 7 July 2023
From Analysing Dataframe
Data cleaning means fixing bad data in your data set. Bad data could be:
A value that may be familiar to NumPy users is NaN. When pandas determines that a series holds numeric values but cannot find a number to represent an entry, it will use NaN. This value stands for Not A Number and is usually ignored in arithmetic operations. (Similar to NULL in SQL).
>>> import numpy as np >>> nan_series = pd.Series ([2, np.nan], ... index =['Ono ', 'Clapton ']) >>> nan_series Ono 2.0 Clapton NaN dtype: float64
the type of this series is float64, not int64! The type is a float because float64 supports NaN, which int64 does not.
When pandas sees numeric data (2) as well as the np.nan, it coerced the 2 to a float value.
... to be continued