Working with CSV files

13.3. Working with CSV files#

13.4. How does a csv file looks?#

Name,Hire Date,Salary,Sick Days remaining

Graham Chapman,03/15/14,50000.00,10

John Cleese,06/01/15,65000.00,8

Eric Idle,05/12/14,45000.00,10

Terry Jones,11/01/13,70000.00,3

Terry Gilliam,08/12/14,48000.00,7

Michael Palin,05/23/13,66000.00,8

Functions to play with csvs are present in Pandas,so import pandas

import pandas as pd

13.5. Reading a csv file into dataframe#

df = pd.read_csv('demo.csv')
print(df)

             Name Hire Date   Salary  Sick Days remaining
0  Graham Chapman  03/15/14  50000.0                   10
1     John Cleese  06/01/15  65000.0                    8
2       Eric Idle  05/12/14  45000.0                   10
3     Terry Jones  11/01/13  70000.0                    3
4   Terry Gilliam  08/12/14  48000.0                    7
5   Michael Palin  05/23/13  66000.0                    8

13.6. Skip the number of lines at the start of the file#

pd.read_csv("demo.csv", skiprows = 1)

	Graham Chapman	03/15/14	50000.00	10
0	John Cleese	06/01/15	65000.0	8
1	Eric Idle	05/12/14	45000.0	10
2	Terry Jones	11/01/13	70000.0	3
3	Terry Gilliam	08/12/14	48000.0	7
4	Michael Palin	05/23/13	66000.0	8

13.7. Getting only some columns while importing#

pd.read_csv('demo.csv',usecols=['Name','Salary']) # you can also use indexes as 0,1,2,3

	Name	Salary
0	Graham Chapman	50000.0
1	John Cleese	65000.0
2	Eric Idle	45000.0
3	Terry Jones	70000.0
4	Terry Gilliam	48000.0
5	Michael Palin	66000.0

13.8. Specifying the data type explictly while importing#

df=pd.read_csv('demo.csv',dtype={'Salary': float,'Name': str})
print(df.dtypes)

Name                    object
Hire Date               object
Salary                 float64
Sick Days remaining      int64
dtype: object

13.9. Parsing date columns as dates instead of objects(which is the default behaviour)#

df = pd.read_csv('demo.csv', parse_dates=['Hire Date'])
df.dtypes

Name                           object
Hire Date              datetime64[ns]
Salary                        float64
Sick Days remaining             int64
dtype: object

13.10. Saving the Dataframe as csv#

df.to_csv('demo2.csv',index=False) # The index false removes the first column of 0,1,2,3 that was added by default