Posted under » Python Data Analysis on 13 June 2023
Initially I thought get dummies is to get permutations and combinations which I use normally this website. However, it is not the same function.
The function in greek is that "each variable is converted in as many 0/1 variables as there are different values. This get dummies is mainly used for machine learning. Columns in the output are each named after a value; if the input is a DataFrame, the name of the original variable is prepended to the value".
If you don't understand it, you are not alone. Best to show you what it means.
>>> import pandas as pd >>> s = pd.Series(list('abca')) >>> pd.get_dummies(s) a b c 0 True False False 1 False True False 2 False False True 3 True False False
There are 3 (abc) unique chars in the list of array of series type, but there are 4 (abca) items. Often there are many repetitions or several `True', but in extreme cases, there are only one `True'. So we can see which are `popular' since they have many `True' occurence. The first [0] is 'a' so it is true while False for b & c. 'a' return True twice because there are 2 a in `abca'.
A slighty complex eg
>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]}) >>> pd.get_dummies(df, prefix=['col1', 'col2']) C col1_a col1_b col2_a col2_b col2_c 0 1 True False False True False 1 2 False True True False False 2 3 True False False False True
Prefix replaces col A into col1 and col B as col2. Get dummies know there are just 2 distinc characters a and b in A, but in col2 it detects 3 chars (a b & c). However, it leaves col C as it is and not part of the true and false thingie.
Col1 has 2 columns because it just have 2 chars a and b. However, Col2 has 3 cols. because it has a, b and c.
pd.get_dummies can detect and create dummy variables from a Pandas Series, or from a column or columns in a Pandas dataframe, it is often used for creating a sample data frame.
You can also create sample data using ones.