Python Pandas Hacks

Pandas Hack 01: Adding single row to a pandas dataframe

df = pd.read_csv('/home/naved/Coding/datasets/test/test.csv')
df2 = pd.concat([df,df[:1]])
df2.index = range(0,14)
df2.ix[13,2] = 'b'
df2.ix[13,3] = '2012/11/22'o

Pandas Hack 02: Quickly create a dataframe

In [1]: import pandas as pd

In [2]: from StringIO import StringIO

In [3]: data = """UsrId JobNos
...: 1 4
...: 1 56
...: 2 23
...: 2 55
...: 2 41
...: 2 5
...: 3 78
...: 1 25
...: 3 1"""

In [4]: df = pd.read_csv(StringIO(data), sep='\s+')

Pandas Hack 03: Drop rows based on duplicate values on a column

df = pd.read_csv('/home/conveyor/clients/wgu_small/ensembles/admissions/input/data/Lead_Label.csv', sep=';')

df2=df.drop_duplicates(cols='LEAD_ID', take_last=True)

df2.to_csv('/home/conveyor/clients/wgu_small/ensembles/admissions/input/data/Lead_Label2.csv', index=False)

Pandas Hack 04: Creating new dataframe from old

df = pd.read_csv('/home/conveyor/clients/wgu/ensembles/admissions/input/data/lead_information_Subject.csv')
df2 = pd.DataFrame({'LEAD_ID': df.LEAD_ID, 'matriculated': np.random.randint(2, size=len(df.index))})
df2.ix[df2['matriculated']==0,'matriculated']=-1
df2.to_csv('Lead_Label_new2.csv', index=False)

Pandas Hack 06: Find duplicate rows in a dataset

df2 = df[df.duplicated('DisplayID') == True] # creates a new dataframe with the repeated rows
len(df2) # number of duplicates

Installing Pandas 0.11.0 with NumPy 1.7.1 on Ubuntu 12.04 LTS

The Ubuntu 12.04 LTS has Python 1.7.3 running on default. To install pandas 0.11. with numpy 1.7.1 go through the following steps –

1. sudo apt-get update
2. sudo apt-get install python-setuptools
3. sudo apt-get install python-dev
4. Download numpy-1.7.1.tar.gz from http://sourceforge.net/projects/numpy/files/NumPy/1.7.1/numpy-1.7.1.tar.gz/download
5. tar zxvf numpy-1.7.1.tar.gz && cd numpy-1.7.1/
6. sudo python setup.py install
7. sudo easy_install pandas==0.11.0 [pandas==0.11.0 to install specific version]