3.7. Running SQL in Pandas#
!pip install pandasql
import pandas as pd
Requirement already satisfied: pandasql in /Users/sahilchoudhary/opt/anaconda3/lib/python3.9/site-packages (0.7.3)
Requirement already satisfied: pandas in /Users/sahilchoudhary/opt/anaconda3/lib/python3.9/site-packages (from pandasql) (1.4.2)
Requirement already satisfied: sqlalchemy in /Users/sahilchoudhary/opt/anaconda3/lib/python3.9/site-packages (from pandasql) (1.4.32)
Requirement already satisfied: numpy in /Users/sahilchoudhary/opt/anaconda3/lib/python3.9/site-packages (from pandasql) (1.22.4)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/sahilchoudhary/opt/anaconda3/lib/python3.9/site-packages (from pandas->pandasql) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/sahilchoudhary/opt/anaconda3/lib/python3.9/site-packages (from pandas->pandasql) (2021.3)
Requirement already satisfied: greenlet!=0.4.17 in /Users/sahilchoudhary/opt/anaconda3/lib/python3.9/site-packages (from sqlalchemy->pandasql) (1.1.1)
Requirement already satisfied: six>=1.5 in /Users/sahilchoudhary/opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas->pandasql) (1.16.0)
[notice] A new release of pip is available: 23.0 -> 23.0.1
[notice] To update, run: pip install --upgrade pip
from pandasql import sqldf
pdsql = lambda q: sqldf(q, globals())
# Just write the sql query in the arguments of this pdsql object
Pandasql allows you to query pandas DataFrames using SQL syntax
Useful for cleaning and filtering
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]})
df
num_legs | num_wings | num_specimen_seen | |
---|---|---|---|
0 | 2 | 2 | 10 |
1 | 4 | 0 | 2 |
2 | 8 | 0 | 1 |
3 | 0 | 0 | 8 |
df1=pdsql("SELECT * FROM df where num_legs>4")
df1
num_legs | num_wings | num_specimen_seen | |
---|---|---|---|
0 | 8 | 0 | 1 |