将间隔的字符串表示形式转换为 pandas 中的实际间隔
Turn string representation of interval into actual interval in pandas
我的问题有点简单,但我不确定是否有办法满足我的要求:
我必须在 SQL 数据库中存储一些数据,其中包括一些稍后将使用的间隔。因此,我不得不将其存储为字符串,如下所示:
variable interval
A (-0.001, 2.0]
A (2.0, 6.0]
所以,那么,我想使用上述间隔来切割另一个变量,就像这样:
df1 = pd.DataFrame({'interval': {4: '(-0.001, 2.0]',
5: '(2.0, 6.0]'},
'variable': {4: 'A',
5: 'A',
}})
df2 = pd.DataFrame({'A': [1,1,3]})
bins = df1[df1.variable.eq('A')].interval
new_series = pd.cut(df2['A'], bins=bins)
但这带来了:
ValueError: could not convert string to float: '(-0.001, 2.0]'
尝试过:
bins = bins.astype('interval')
但这带来了:
TypeError: type <class 'str'> with value (-0.001, 2.0] is not an interval
有什么我可以做的吗?谢谢
IIUC,你可以手动解析字符串,然后将 bins 转换为 IntervalIndex:
import ast
import pandas as pd
def interval_type(s):
"""Parse interval string to Interval"""
table = str.maketrans({'[': '(', ']': ')'})
left_closed = s.startswith('[')
right_closed = s.endswith(']')
left, right = ast.literal_eval(s.translate(table))
t = 'neither'
if left_closed and right_closed:
t = 'both'
elif left_closed:
t = 'left'
elif right_closed:
t = 'right'
return pd.Interval(left, right, closed=t)
df1 = pd.DataFrame({'interval': {4: '(-0.001, 2.0]', 5: '(2.0, 6.0]'},
'variable': {4: 'A', 5: 'A'}})
df1['interval'] = df1['interval'].apply(interval_type)
df2 = pd.DataFrame({'A': [1, 1, 3]})
bins = df1[df1.variable.eq('A')].interval
new_series = pd.cut(df2['A'], bins=pd.IntervalIndex(bins))
print(new_series)
输出
0 (-0.001, 2.0]
1 (-0.001, 2.0]
2 (2.0, 6.0]
Name: A, dtype: category
Categories (2, interval[float64]): [(-0.001, 2.0] < (2.0, 6.0]]
我的问题有点简单,但我不确定是否有办法满足我的要求:
我必须在 SQL 数据库中存储一些数据,其中包括一些稍后将使用的间隔。因此,我不得不将其存储为字符串,如下所示:
variable interval
A (-0.001, 2.0]
A (2.0, 6.0]
所以,那么,我想使用上述间隔来切割另一个变量,就像这样:
df1 = pd.DataFrame({'interval': {4: '(-0.001, 2.0]',
5: '(2.0, 6.0]'},
'variable': {4: 'A',
5: 'A',
}})
df2 = pd.DataFrame({'A': [1,1,3]})
bins = df1[df1.variable.eq('A')].interval
new_series = pd.cut(df2['A'], bins=bins)
但这带来了:
ValueError: could not convert string to float: '(-0.001, 2.0]'
尝试过:
bins = bins.astype('interval')
但这带来了:
TypeError: type <class 'str'> with value (-0.001, 2.0] is not an interval
有什么我可以做的吗?谢谢
IIUC,你可以手动解析字符串,然后将 bins 转换为 IntervalIndex:
import ast
import pandas as pd
def interval_type(s):
"""Parse interval string to Interval"""
table = str.maketrans({'[': '(', ']': ')'})
left_closed = s.startswith('[')
right_closed = s.endswith(']')
left, right = ast.literal_eval(s.translate(table))
t = 'neither'
if left_closed and right_closed:
t = 'both'
elif left_closed:
t = 'left'
elif right_closed:
t = 'right'
return pd.Interval(left, right, closed=t)
df1 = pd.DataFrame({'interval': {4: '(-0.001, 2.0]', 5: '(2.0, 6.0]'},
'variable': {4: 'A', 5: 'A'}})
df1['interval'] = df1['interval'].apply(interval_type)
df2 = pd.DataFrame({'A': [1, 1, 3]})
bins = df1[df1.variable.eq('A')].interval
new_series = pd.cut(df2['A'], bins=pd.IntervalIndex(bins))
print(new_series)
输出
0 (-0.001, 2.0]
1 (-0.001, 2.0]
2 (2.0, 6.0]
Name: A, dtype: category
Categories (2, interval[float64]): [(-0.001, 2.0] < (2.0, 6.0]]