13.3. Working with CSV files#
13.4. How does a csv file looks?#
Name,Hire Date,Salary,Sick Days remaining
Graham Chapman,03/15/14,50000.00,10
John Cleese,06/01/15,65000.00,8
Eric Idle,05/12/14,45000.00,10
Terry Jones,11/01/13,70000.00,3
Terry Gilliam,08/12/14,48000.00,7
Michael Palin,05/23/13,66000.00,8
Functions to play with csvs are present in Pandas,so import pandas
import pandas as pd
13.5. Reading a csv file into dataframe#
df = pd.read_csv('demo.csv')
print(df)
Name Hire Date Salary Sick Days remaining
0 Graham Chapman 03/15/14 50000.0 10
1 John Cleese 06/01/15 65000.0 8
2 Eric Idle 05/12/14 45000.0 10
3 Terry Jones 11/01/13 70000.0 3
4 Terry Gilliam 08/12/14 48000.0 7
5 Michael Palin 05/23/13 66000.0 8
13.6. Skip the number of lines at the start of the file#
pd.read_csv("demo.csv", skiprows = 1)
Graham Chapman | 03/15/14 | 50000.00 | 10 | |
---|---|---|---|---|
0 | John Cleese | 06/01/15 | 65000.0 | 8 |
1 | Eric Idle | 05/12/14 | 45000.0 | 10 |
2 | Terry Jones | 11/01/13 | 70000.0 | 3 |
3 | Terry Gilliam | 08/12/14 | 48000.0 | 7 |
4 | Michael Palin | 05/23/13 | 66000.0 | 8 |
13.7. Getting only some columns while importing#
pd.read_csv('demo.csv',usecols=['Name','Salary']) # you can also use indexes as 0,1,2,3
Name | Salary | |
---|---|---|
0 | Graham Chapman | 50000.0 |
1 | John Cleese | 65000.0 |
2 | Eric Idle | 45000.0 |
3 | Terry Jones | 70000.0 |
4 | Terry Gilliam | 48000.0 |
5 | Michael Palin | 66000.0 |
13.8. Specifying the data type explictly while importing#
df=pd.read_csv('demo.csv',dtype={'Salary': float,'Name': str})
print(df.dtypes)
Name object
Hire Date object
Salary float64
Sick Days remaining int64
dtype: object
13.9. Parsing date columns as dates instead of objects(which is the default behaviour)#
df = pd.read_csv('demo.csv', parse_dates=['Hire Date'])
df.dtypes
Name object
Hire Date datetime64[ns]
Salary float64
Sick Days remaining int64
dtype: object
13.10. Saving the Dataframe as csv#
df.to_csv('demo2.csv',index=False) # The index false removes the first column of 0,1,2,3 that was added by default