错误无法将字符串转换为浮点数：''

Question

我有一列不想从对象更改为浮动。来自 xlsx 文件的数据总是以相同的方式呈现（作为数字），但不知何故只有这一列被视为对象。

列中的数字表示使用点 (.) 作为小数位的百分比。

xls3[' Vesturland'] = xls3[' Vesturland'].astype(float)

不起作用。没有特殊字符可以替换（eg.str.replace()），我也试过了

不敢用

xls3[' Vesturland'] = pd.to_numeric(xls3[' Vesturland'])

因为它将所有浮点数更改为 NaN 并且整列都是百分比值。

我唯一能想到的是小数位数不一致，但这应该无关紧要，是吗？我在要更改为浮动的列上放了一个红色箭头。

我只在尝试转换为浮点数时出现此错误 Error could not convert string to float: '' 并且针对我的特定问题搜索它尚未给出任何结果。

Answer 1

莫非是你for i in range 0,len(tablename)

你需要len(tablename)-1

因为你从 0 开始？

Answer 2

您的 pd.Series 中有空字符串，无法轻松转换为 float 数据类型。您可以做的是检查并删除它们。示例脚本是：

import pandas as pd

a=pd.DataFrame([['a','b','c'],['2.42','','3.285']]).T
a.columns=['names', 'nums']

a['nums']=a['nums'][a['nums']!=''].astype(float)

注意：如果您在选择 non-empty 字符串之前尝试运行 a['nums']=a['nums'].astype(float)，则会抛出与您提到的相同的错误。

Answer 3

首先使用这一行获取当前数据类型：

col_dtypes = dict([(k, v.name) for k, v in dict(df.dtypes).items()])

像这样：

xls3 = pd.read_csv('path/to/file')
col_dtypes = dict([(k, v.name) for k, v in dict(xls3.dtypes).items()])
print(col_dtypes)

复制打印的值。应该是这样的：

{'Date': 'object', 'Karlar': 'float64', 'Konur': 'float64', ' Vesturland': 'object', ...}

然后，对于您知道其数据类型不是对象的列，将其更改为所需的类型（'int32'、'int64'、'float32' 或 'float64' ) 例子：数据类型可能被检测为：

{'Date': 'object', 'Karlar': 'float64', 'Konur': 'float64', ' Vesturland': 'object', ...}

如果我们知道 Vesturland 应该是 Float，那么我们可以将其编辑为：

col_dtypes = {
    'Date': 'object', 'Karlar': 'float64', 'Konur': 'float64', 
    ' Vesturland': 'float64', ...
}

现在，通过此代码段，您可以找到 non-numeric 个值：

def clean_non_numeric_values(series, col_type):
    illegal_value_pos = []
    for i in range(len(series)):
        try:
            if col_type == 'int64' or col_type == 'int32':
                val = int(series[i])
            elif col_type == 'float32' or col_type == 'float64':
                val = float(series[i])
        except:
            illegal_value_pos.append(i)
            # series[i] = None # We can set the illegal values to None 
            # to remove them later using xls3.dropna()
    return series, illegal_value_pos



# Now we will manually replace the dtype of the column Vesturland like so:
col_dtypes = {
    'Date': 'object', 'Karlar': 'float64', 'Konur': 'float64', 
    ' Vesturland': 'float64'
}

for col in list(xls3.columns):
    if col_dtypes[col] in ['int32', 'int64', 'float32', 'float64']:
        series, illegal_value_pos = (
            clean_non_numeric_values(series=xls3[col], col_type=col_dtypes[col])
        )
        xls3[col] = series
        print(illegal_value_pos)
        if illegal_value_pos:
            illegal_rows = xls3.iloc[illegal_value_pos]
            # This will print all the illegal values.
            print(illegal_rows[col])

现在您可以使用此信息从数据框中删除 non-numeric 值。

警告：由于这使用了 for 循环，速度很慢，但它会帮助您删除不需要的值。

Answer 4

经过反复试验，我最终打开了 excel sheet，删除了最后一个数据输入下方的大约 10 行。然后我解冻 rows/column，再次将其读入 Jupyter Notebook，现在 所有数据都是浮动的 。我不知道是哪个把戏，但现在已经解决了。感谢大家花时间在这里帮助我解决这个问题。

Answer 5

len([x for x in xls3[' Vesturland'] if x == ' '])

有时它可能是空白转到您的 CSV 文件从 excel 打开它并检查 ctrl+shift+l 过滤器和空白 space.

错误无法将字符串转换为浮点数：''

Error could not convert string to float: ''

python

pandas

jupyter