组合包含字符串和整数值的元组中的值并存储为数据框
Combining values in a tuple that contain both string and integers values and storing as a dataframe
我正在构建一个函数,该函数接收公司描述列表,然后输出列表中最常见的三词短语。我已经能够让它输出这样构造的元组字典:
{('technology', 'company', 'provides'): 2,
('various', 'industries.', 'company'): 2,
('provides', 'software', 'solutions'): 2,
('life', 'health', 'insurance'): 2,...}
我想将其转换为 table/dataframe,将字符串连接成一个值,然后创建一个单独的列来存储短语的实例数。
理想的输出是:
Phrase
Occurrence
technology company provides
2
various industries company
2
provides software solutions
2
life health insurance
2
我尝试使用以下方法将元组组合成一个字符串,但它降低了出现次数:
# function that converts tuple to string
def join_tuple_string(descriptions) -> str:
return ' '.join(descriptions)
# joining all the tuples
result = map(join_tuple_string, descriptions)
# converting and printing the result
print(list(result))
这是输出:
['technology company provides',
'provides software solutions',
'product suite includes', 'life health insurance',...]
如何在不丢失出现次数的情况下连接这些值?我希望能够将其导出为 CSV 文件以查看完整列表。
给定:
din = {('technology', 'company', 'provides'): 2,
('various', 'industries.', 'company'): 2,
('provides', 'software', 'solutions'): 2,
('life', 'health', 'insurance'): 2}
In 将按如下方式进行:
def reportValues(d):
result = []
for ky, v in d.items():
result.append([' '.join(ky), v])
return result
result = reportValues(din)
for r in result:
print(f'{r[0]:25}\t{r[1]}')
产生:
technology company provides 2
various industries. company 2
provides software solutions 2
life health insurance 2
import pandas as pd
result = {('technology', 'company', 'provides'): 2,
('various', 'industries.', 'company'): 2,
('provides', 'software', 'solutions'): 2,
('life', 'health', 'insurance'): 2}
df = pd.DataFrame(result.items(), columns=['phrase', 'occurrence'])
df.phrase = df.phrase.str.join(' ')
print(df)
df.to_csv('phrases.csv', index=False)
df
输出:
phrase occurrence
0 technology company provides 2
1 various industries. company 2
2 provides software solutions 2
3 life health insurance 2
csv 文件:
phrase,occurrence
technology company provides,2
various industries. company,2
provides software solutions,2
life health insurance,2
我正在构建一个函数,该函数接收公司描述列表,然后输出列表中最常见的三词短语。我已经能够让它输出这样构造的元组字典:
{('technology', 'company', 'provides'): 2,
('various', 'industries.', 'company'): 2,
('provides', 'software', 'solutions'): 2,
('life', 'health', 'insurance'): 2,...}
我想将其转换为 table/dataframe,将字符串连接成一个值,然后创建一个单独的列来存储短语的实例数。
理想的输出是:
Phrase | Occurrence |
---|---|
technology company provides | 2 |
various industries company | 2 |
provides software solutions | 2 |
life health insurance | 2 |
我尝试使用以下方法将元组组合成一个字符串,但它降低了出现次数:
# function that converts tuple to string
def join_tuple_string(descriptions) -> str:
return ' '.join(descriptions)
# joining all the tuples
result = map(join_tuple_string, descriptions)
# converting and printing the result
print(list(result))
这是输出:
['technology company provides',
'provides software solutions',
'product suite includes', 'life health insurance',...]
如何在不丢失出现次数的情况下连接这些值?我希望能够将其导出为 CSV 文件以查看完整列表。
给定:
din = {('technology', 'company', 'provides'): 2,
('various', 'industries.', 'company'): 2,
('provides', 'software', 'solutions'): 2,
('life', 'health', 'insurance'): 2}
In 将按如下方式进行:
def reportValues(d):
result = []
for ky, v in d.items():
result.append([' '.join(ky), v])
return result
result = reportValues(din)
for r in result:
print(f'{r[0]:25}\t{r[1]}')
产生:
technology company provides 2
various industries. company 2
provides software solutions 2
life health insurance 2
import pandas as pd
result = {('technology', 'company', 'provides'): 2,
('various', 'industries.', 'company'): 2,
('provides', 'software', 'solutions'): 2,
('life', 'health', 'insurance'): 2}
df = pd.DataFrame(result.items(), columns=['phrase', 'occurrence'])
df.phrase = df.phrase.str.join(' ')
print(df)
df.to_csv('phrases.csv', index=False)
df
输出:
phrase occurrence
0 technology company provides 2
1 various industries. company 2
2 provides software solutions 2
3 life health insurance 2
csv 文件:
phrase,occurrence
technology company provides,2
various industries. company,2
provides software solutions,2
life health insurance,2