Python Pandas Hacks

Pandas Hack 01: Adding single row to a pandas dataframe

df = pd.read_csv('/home/naved/Coding/datasets/test/test.csv')
df2 = pd.concat([df,df[:1]])
df2.index = range(0,14)
df2.ix[13,2] = 'b'
df2.ix[13,3] = '2012/11/22'o

Pandas Hack 02: Quickly create a dataframe

In [1]: import pandas as pd

In [2]: from StringIO import StringIO

In [3]: data = """UsrId JobNos
...: 1 4
...: 1 56
...: 2 23
...: 2 55
...: 2 41
...: 2 5
...: 3 78
...: 1 25
...: 3 1"""

In [4]: df = pd.read_csv(StringIO(data), sep='\s+')

Pandas Hack 03: Drop rows based on duplicate values on a column

df = pd.read_csv('/home/conveyor/clients/wgu_small/ensembles/admissions/input/data/Lead_Label.csv', sep=';')

df2=df.drop_duplicates(cols='LEAD_ID', take_last=True)

df2.to_csv('/home/conveyor/clients/wgu_small/ensembles/admissions/input/data/Lead_Label2.csv', index=False)

Pandas Hack 04: Creating new dataframe from old

df = pd.read_csv('/home/conveyor/clients/wgu/ensembles/admissions/input/data/lead_information_Subject.csv')
df2 = pd.DataFrame({'LEAD_ID': df.LEAD_ID, 'matriculated': np.random.randint(2, size=len(df.index))})
df2.ix[df2['matriculated']==0,'matriculated']=-1
df2.to_csv('Lead_Label_new2.csv', index=False)

Pandas Hack 06: Find duplicate rows in a dataset

df2 = df[df.duplicated('DisplayID') == True] # creates a new dataframe with the repeated rows
len(df2) # number of duplicates
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s