13.3. Working with CSV files#

13.4. How does a csv file looks?#

Name,Hire Date,Salary,Sick Days remaining

Graham Chapman,03/15/14,50000.00,10

John Cleese,06/01/15,65000.00,8

Eric Idle,05/12/14,45000.00,10

Terry Jones,11/01/13,70000.00,3

Terry Gilliam,08/12/14,48000.00,7

Michael Palin,05/23/13,66000.00,8

Functions to play with csvs are present in Pandas,so import pandas

import pandas as pd

13.5. Reading a csv file into dataframe#

df = pd.read_csv('demo.csv')
print(df)
             Name Hire Date   Salary  Sick Days remaining
0  Graham Chapman  03/15/14  50000.0                   10
1     John Cleese  06/01/15  65000.0                    8
2       Eric Idle  05/12/14  45000.0                   10
3     Terry Jones  11/01/13  70000.0                    3
4   Terry Gilliam  08/12/14  48000.0                    7
5   Michael Palin  05/23/13  66000.0                    8

13.6. Skip the number of lines at the start of the file#

pd.read_csv("demo.csv", skiprows = 1)
Graham Chapman 03/15/14 50000.00 10
0 John Cleese 06/01/15 65000.0 8
1 Eric Idle 05/12/14 45000.0 10
2 Terry Jones 11/01/13 70000.0 3
3 Terry Gilliam 08/12/14 48000.0 7
4 Michael Palin 05/23/13 66000.0 8

13.7. Getting only some columns while importing#

pd.read_csv('demo.csv',usecols=['Name','Salary']) # you can also use indexes as 0,1,2,3
Name Salary
0 Graham Chapman 50000.0
1 John Cleese 65000.0
2 Eric Idle 45000.0
3 Terry Jones 70000.0
4 Terry Gilliam 48000.0
5 Michael Palin 66000.0

13.8. Specifying the data type explictly while importing#

df=pd.read_csv('demo.csv',dtype={'Salary': float,'Name': str})
print(df.dtypes)
Name                    object
Hire Date               object
Salary                 float64
Sick Days remaining      int64
dtype: object

13.9. Parsing date columns as dates instead of objects(which is the default behaviour)#

df = pd.read_csv('demo.csv', parse_dates=['Hire Date'])
df.dtypes
Name                           object
Hire Date              datetime64[ns]
Salary                        float64
Sick Days remaining             int64
dtype: object

13.10. Saving the Dataframe as csv#

df.to_csv('demo2.csv',index=False) # The index false removes the first column of 0,1,2,3 that was added by default