Cleaning data

Posted under » Python Data Analysis on 7 July 2023

From Analysing Dataframe

Data cleaning means fixing bad data in your data set. Bad data could be:

A value that may be familiar to NumPy users is NaN. When pandas determines that a series holds numeric values but cannot find a number to represent an entry, it will use NaN. This value stands for Not A Number and is usually ignored in arithmetic operations. (Similar to NULL in SQL).

>>> import numpy as np
>>> nan_series = pd.Series ([2, np.nan],
... index =['Ono ', 'Clapton '])
>>> nan_series
Ono 2.0
Clapton NaN
dtype: float64

the type of this series is float64, not int64! The type is a float because float64 supports NaN, which int64 does not. When pandas sees numeric data (2) as well as the np.nan, it coerced the 2 to a float value. ... to be continued

web security linux ubuntu python django git Raspberry apache mysql php drupal cake javascript css AWS data