PD create dummies

Posted under » Python Data Analysis on 13 June 2023

Initially I thought get dummies is to get permutations and combinations which I use normally this website. However, it is not the same function.

The function in greek is that "each variable is converted in as many 0/1 variables as there are different values. This get dummies is mainly used for machine learning. Columns in the output are each named after a value; if the input is a DataFrame, the name of the original variable is prepended to the value".

If you don't understand it, you are not alone. Best to show you what it means.

>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> pd.get_dummies(s)
       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True
3   True  False  False

There are 3 (abc) unique chars in the list of array of series type, but there are 4 (abca) items. Often there are many repetitions or several `True', but in extreme cases, there are only one `True'. So we can see which are `popular' since they have many `True' occurence. The first [0] is 'a' so it is true while False for b & c. 'a' return True twice because there are 2 a in `abca'.

A slighty complex eg

>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})
>>> pd.get_dummies(df, prefix=['col1', 'col2'])
   C  col1_a  col1_b  col2_a  col2_b  col2_c
0  1    True   False   False    True   False
1  2   False    True    True   False   False
2  3    True   False   False   False    True

Prefix replaces col A into col1 and col B as col2. Get dummies know there are just 2 distinc characters a and b in A, but in col2 it detects 3 chars (a b & c). However, it leaves col C as it is and not part of the true and false thingie.

Col1 has 2 columns because it just have 2 chars a and b. However, Col2 has 3 cols. because it has a, b and c.

pd.get_dummies can detect and create dummy variables from a Pandas Series, or from a column or columns in a Pandas dataframe, it is often used for creating a sample data frame.

You can also create sample data using ones.

