1 2 3 4 |
import pandas as pd data = pd.read_csv(r'guazi.csv') print(data.head(50)) |

发现数据会有NaN的这个缺失值
监测缺失值的方法:
isnull和notnull
注:这两个方法也会把Python中的None也算进去
1 2 3 |
data = pd.read_csv(r'guazi.csv') print(data.head(20).isnull()) print(data.head(20).notnull()) |

删除缺失值:
Dropna():会删除任何有Nan的行,如果要删除列的话就要加参数axis=1,默认为0为行,1为列
1 2 3 4 5 |
data = pd.read_csv(r'guazi.csv') data1 = data.head(20) print(data1.isnull()) res_data = data1.dropna() print(res_data.isnull()) |

发现14行有NaN的已经删除
删除全为Nan的行或者列:
1 2 3 4 5 6 |
import pandas as pd import numpy as np data = pd.DataFrame([[1.,6.5,3.],[1.,np.nan,np.nan],[np.nan,np.nan,np.nan],[np.nan,6.7,7.]]) print(data) print("="*30) print(data.dropna(how='all')) |

1 2 3 4 5 6 |
import pandas as pd import numpy as np data = pd.DataFrame([[1.,6.5,np.nan],[1.,np.nan,np.nan],[np.nan,np.nan,np.nan],[np.nan,6.7,np.nan]]) print(data) print("="*30) print(data.dropna(axis=1,how='all')) |

删除大于等于Nan个数的行
在dropna()中新增参数thresh
1 2 3 4 5 6 |
import pandas as pd import numpy as np data = pd.DataFrame([[1.,6.5,np.nan],[1.,np.nan,np.nan],[np.nan,np.nan,np.nan],[np.nan,6.7,np.nan]]) print(data) print("="*30) print(data.dropna(thresh=2)) |

替换缺失值:
全部替换:
1 2 3 4 5 6 7 |
import pandas as pd import numpy as np data = pd.DataFrame([[1.,6.5,np.nan],[1.,np.nan,np.nan],[np.nan,np.nan,np.nan],[np.nan,6.7,np.nan]]) print(data) print("="*30) df1 = data.fillna(0) print(df1) |

指定替换:
1 2 3 4 5 6 7 |
import pandas as pd import numpy as np data = pd.DataFrame([[1.,6.5,np.nan],[1.,np.nan,np.nan],[np.nan,np.nan,np.nan],[np.nan,6.7,np.nan]]) print(data) print("="*30) df1 = data.fillna({0:10.,1:20.,2:30.}) print(df1) |

指定的时候传递字典,字典的键是索引号
如果不想另外生成一组DataFrame的话就可以增加参数inplace=True,就直接在原始数据上修改
指定Nan的值和上一行的相同:向下复制
1 2 3 4 5 6 7 |
import pandas as pd import numpy as np data = pd.DataFrame([[1.,6.5,np.nan],[1.,np.nan,np.nan],[np.nan,np.nan,np.nan],[np.nan,6.7,np.nan]]) print(data) print("="*30) df1 = data.fillna(method='ffill' ,limit=1) print(df1) |

Limit参数用于指定向下复制多少个