为 python 中的每一列设置唯一的缩写
setting unique abbreviation for every column in python
我在 csv 文件中有这样的数据
Ad Group
Annuity Calculator
Tax Deferred Annuity
Annuity Tables
annuities calculator
annuity formula
Annuities Explained
Deferred Annuies Calculator
Current Annuity Rates
Forbes.com
Annuity Definition
fixed income
Immediate fixed Annuities
Deferred Variable Annuities
401k Rollover
Deferred Annuity Rates
Deferred Annuities
Immediate Annuities Definition
Immediate Variable Annuities
Variable Annuity
Aig Annuities
Retirement Income
retirment system
Online Financial Planner
Certified Financial Planner
我想为每一列设置一个唯一的缩写。例如:
- 年金计算器 = annca
- 年金计算器 = annsca
你能帮我弄清楚在 python 中最好的方法是什么吗?
谢谢
您的问题没有完全说明,但看起来很有趣。我试了一下。我写了一个函数,它接受一个短语列表和 returns 一个字典,其中缩写作为键。它首先取每个单词的前两个字母并将它们连接起来作为候选缩写。如果该缩写之前被使用过,它会逐渐从每个单词的开头开始使用越来越多的字母,直到您获得一个独特的缩写。然后我在你的样本数据上测试了它。您几乎肯定会想要修改它,但它应该会给您一些想法:
def makeAbbreviations(headers):
abbreviations = {}
for header in headers:
header = header.lower()
words = header.split()
n = max(len(w) for w in words)
i = 2
starts = [w[:i] for w in words]
abbrev = ''.join(starts)
while abbrev in abbreviations and i <= n:
i += 1
for j,w in enumerate(words):
starts[j] = w[:i]
abbrev = ''.join(starts)
if not abbrev in abbreviations: break
abbreviations[abbrev] = header
return abbreviations
myHeaders = ['Ad Group', 'Annuity Calculator', 'Tax Deferred Annuity',
'Annuity Tables', 'annuities calculator', 'annuity formula',
'Annuities Explained', 'Deferred Annuies Calculator',
'Current Annuity Rates', 'Forbes.com', 'Annuity Definition',
'fixed income', 'Immediate fixed Annuities',
'Deferred Variable Annuities', '401k Rollover',
'Deferred Annuity Rates', 'Deferred Annuities',
'Immediate Annuities Definition', 'Immediate Variable Annuities',
'Variable Annuity', 'Aig Annuities', 'Retirement Income', 'retirment system',
'Online Financial Planner', 'Certified Financial Planner']
d = makeAbbreviations(myHeaders)
for (k,v) in d.items(): print(k,v,sep = " = ")
输出:
imande = immediate annuities definition
adgr = ad group
fiin = fixed income
40ro = 401k rollover
resy = retirment system
vaan = variable annuity
devaan = deferred variable annuities
rein = retirement income
imvaan = immediate variable annuities
fo = forbes.com
imfian = immediate fixed annuities
dean = deferred annuities
anca = annuity calculator
cuanra = current annuity rates
annca = annuities calculator
onfipl = online financial planner
aian = aig annuities
ande = annuity definition
anfo = annuity formula
cefipl = certified financial planner
tadean = tax deferred annuity
deanca = deferred annuies calculator
anex = annuities explained
anta = annuity tables
deanra = deferred annuity rates
我在 csv 文件中有这样的数据
Ad Group
Annuity Calculator
Tax Deferred Annuity
Annuity Tables
annuities calculator
annuity formula
Annuities Explained
Deferred Annuies Calculator
Current Annuity Rates
Forbes.com
Annuity Definition
fixed income
Immediate fixed Annuities
Deferred Variable Annuities
401k Rollover
Deferred Annuity Rates
Deferred Annuities
Immediate Annuities Definition
Immediate Variable Annuities
Variable Annuity
Aig Annuities
Retirement Income
retirment system
Online Financial Planner
Certified Financial Planner
我想为每一列设置一个唯一的缩写。例如:
- 年金计算器 = annca
- 年金计算器 = annsca
你能帮我弄清楚在 python 中最好的方法是什么吗?
谢谢
您的问题没有完全说明,但看起来很有趣。我试了一下。我写了一个函数,它接受一个短语列表和 returns 一个字典,其中缩写作为键。它首先取每个单词的前两个字母并将它们连接起来作为候选缩写。如果该缩写之前被使用过,它会逐渐从每个单词的开头开始使用越来越多的字母,直到您获得一个独特的缩写。然后我在你的样本数据上测试了它。您几乎肯定会想要修改它,但它应该会给您一些想法:
def makeAbbreviations(headers):
abbreviations = {}
for header in headers:
header = header.lower()
words = header.split()
n = max(len(w) for w in words)
i = 2
starts = [w[:i] for w in words]
abbrev = ''.join(starts)
while abbrev in abbreviations and i <= n:
i += 1
for j,w in enumerate(words):
starts[j] = w[:i]
abbrev = ''.join(starts)
if not abbrev in abbreviations: break
abbreviations[abbrev] = header
return abbreviations
myHeaders = ['Ad Group', 'Annuity Calculator', 'Tax Deferred Annuity',
'Annuity Tables', 'annuities calculator', 'annuity formula',
'Annuities Explained', 'Deferred Annuies Calculator',
'Current Annuity Rates', 'Forbes.com', 'Annuity Definition',
'fixed income', 'Immediate fixed Annuities',
'Deferred Variable Annuities', '401k Rollover',
'Deferred Annuity Rates', 'Deferred Annuities',
'Immediate Annuities Definition', 'Immediate Variable Annuities',
'Variable Annuity', 'Aig Annuities', 'Retirement Income', 'retirment system',
'Online Financial Planner', 'Certified Financial Planner']
d = makeAbbreviations(myHeaders)
for (k,v) in d.items(): print(k,v,sep = " = ")
输出:
imande = immediate annuities definition
adgr = ad group
fiin = fixed income
40ro = 401k rollover
resy = retirment system
vaan = variable annuity
devaan = deferred variable annuities
rein = retirement income
imvaan = immediate variable annuities
fo = forbes.com
imfian = immediate fixed annuities
dean = deferred annuities
anca = annuity calculator
cuanra = current annuity rates
annca = annuities calculator
onfipl = online financial planner
aian = aig annuities
ande = annuity definition
anfo = annuity formula
cefipl = certified financial planner
tadean = tax deferred annuity
deanca = deferred annuies calculator
anex = annuities explained
anta = annuity tables
deanra = deferred annuity rates