过滤 pandas DataFrame 到一列时的所有 NAN 值
All NAN values when filtering pandas DataFrame to one column
我正在从存储在一个数据框中的 .csv file 导入数据。那里看起来不错:
之后,我尝试将数据框的一列仅存储在别处。但是,它 returns 所有 NaN 值:
完全相同的代码适用于同一 Python 脚本中较早的 .xls 文件。所以我不确定这里发生了什么。任何澄清将不胜感激。这是源代码:
# ------------------------------------------------------------------------------
print("\nSELECT Q MEASUREMENT FILE TO FIX: ")
time.sleep(1)
# Allow User to pick file that which needs X-Y data to be FIXED
tkinter.Tk().withdraw() # Close the root window
input2 = filedialog.askopenfilename()
print("\nYou selected file:")
print(input2)
print("\n")
input2 = str(input2)
# Check to see if directory/file exists
assert os.path.exists(input2), "File does not exist at, "+str(input2)
# Import data below and store in df
print("\nImporting Excel Workbook...")
time.sleep(1)
# You can check encoding of file with notepad++
dfQ = pd.read_csv(input2, encoding="ansi")
dfQ.values
print(dfQ) # This DataFrame (dfQ) contains the entire excel workbook
print("\n\nWorkbook Successfully Imported")
time.sleep(.5)
print("...")
# Search Q measurements CSV for "Chip ID" and matches it to corresponding
# "PartID" in the master table created from manually fixed file.
print("Matching PartID's to update proper X-Y values")
time.sleep(.5)
print("...")
IDs = pd.DataFrame(dfQ, columns=['Chip ID'])
time.sleep(.5)
print(IDs)
s = IDs.size
print("\nSuccessfully extracted", s, "Chip ID's!")
print(dfQ.columns)
您所要做的就是:
IDs = dfQ["Chip ID"]
你会得到对应的pandas.series。如果您希望 pandas.DataFrame 中的结果进行格式化,请执行以下操作:
IDs = dfQ["Chip ID"].to_frame()
编辑:
您的列名称以 space:
开头
Index(['Date', ' Time', ' Device ID', ' Chip ID', ' Lot', ' Wafer', ' X', ' Y',
' Q half-width', ' Q fit', ' dQ¸ %', ' Internal Resonant F',
' Internal Resonant A', ' Ajusted FG Ampl', ' FG Amplitude (0.10)',
' Forced A', ' Forced F', ' Drive Gain', ' Frequency sweep¸start',
' Prelim Q half-width', ' Prelim Q fit', ' Prelim Q Error¸ %',
' Execution time', ' Preliminary F', ' residue', ' '],
dtype='object')
所以你要做的就是:
IDs = dfQ[" Chip ID"]
问题是您的列实际上命名为 Chip ID
(带有 space)而不是 Chip ID
。
所以 IDs = dfQ[" Chip ID"]
系列或 IDs = dfQ[[" Chip ID"]]
都应该有效。
我正在从存储在一个数据框中的 .csv file 导入数据。那里看起来不错:
之后,我尝试将数据框的一列仅存储在别处。但是,它 returns 所有 NaN 值:
完全相同的代码适用于同一 Python 脚本中较早的 .xls 文件。所以我不确定这里发生了什么。任何澄清将不胜感激。这是源代码:
# ------------------------------------------------------------------------------
print("\nSELECT Q MEASUREMENT FILE TO FIX: ")
time.sleep(1)
# Allow User to pick file that which needs X-Y data to be FIXED
tkinter.Tk().withdraw() # Close the root window
input2 = filedialog.askopenfilename()
print("\nYou selected file:")
print(input2)
print("\n")
input2 = str(input2)
# Check to see if directory/file exists
assert os.path.exists(input2), "File does not exist at, "+str(input2)
# Import data below and store in df
print("\nImporting Excel Workbook...")
time.sleep(1)
# You can check encoding of file with notepad++
dfQ = pd.read_csv(input2, encoding="ansi")
dfQ.values
print(dfQ) # This DataFrame (dfQ) contains the entire excel workbook
print("\n\nWorkbook Successfully Imported")
time.sleep(.5)
print("...")
# Search Q measurements CSV for "Chip ID" and matches it to corresponding
# "PartID" in the master table created from manually fixed file.
print("Matching PartID's to update proper X-Y values")
time.sleep(.5)
print("...")
IDs = pd.DataFrame(dfQ, columns=['Chip ID'])
time.sleep(.5)
print(IDs)
s = IDs.size
print("\nSuccessfully extracted", s, "Chip ID's!")
print(dfQ.columns)
您所要做的就是:
IDs = dfQ["Chip ID"]
你会得到对应的pandas.series。如果您希望 pandas.DataFrame 中的结果进行格式化,请执行以下操作:
IDs = dfQ["Chip ID"].to_frame()
编辑:
您的列名称以 space:
开头Index(['Date', ' Time', ' Device ID', ' Chip ID', ' Lot', ' Wafer', ' X', ' Y',
' Q half-width', ' Q fit', ' dQ¸ %', ' Internal Resonant F',
' Internal Resonant A', ' Ajusted FG Ampl', ' FG Amplitude (0.10)',
' Forced A', ' Forced F', ' Drive Gain', ' Frequency sweep¸start',
' Prelim Q half-width', ' Prelim Q fit', ' Prelim Q Error¸ %',
' Execution time', ' Preliminary F', ' residue', ' '],
dtype='object')
所以你要做的就是:
IDs = dfQ[" Chip ID"]
问题是您的列实际上命名为 Chip ID
(带有 space)而不是 Chip ID
。
所以 IDs = dfQ[" Chip ID"]
系列或 IDs = dfQ[[" Chip ID"]]
都应该有效。