从文本文件中按行号索引单词
Index words by line number from text file
所以我的作业问题是lineIndex
索引文本文件中的单词和return 文本文件中每个单词的行号列表。整个输出必须 return 编入字典。
例如,文本文件中的内容如下:
I have no pride
I have no shame
You gotta make it rain
Make it rain rain rain`
我的教授希望输出如下所示:
{'rain': [2, 3], 'gotta': [2], 'make': [2], 'it': [2, 3], 'shame': [1], 'I': [0, 1], 'You': [2], 'have': [0, 1], 'no': [0, 1], 'Make': [3], 'pride': [0]}
例如:单词 'rain' 在第 2 行和第 3 行中。 (第一行总是从零开始)
到目前为止,这是我的代码,但我需要有关算法的帮助。
def lineIndex(fName):
d = {}
with open(fName, 'r') as f:
#algorithm goes here
print(lineIndex('index.txt'))
试试这个
def lineIndex(fName):
dic = {}
i=0
with open(fName, 'r') as f:
while True:
x=f.readline()
if not x:
break
i+=1
for j in x:
if j in dic:
dic[j].add(i)
else:
dic[j]=set()
dic[j].add(i)
print (dic)
print (lineIndex("index.txt"))
这里有一个使用集合的简单方法,我会给你练习如何用一个文件来做。
In [14]: text = """I have no pride
...: I have no shame
...: You gotta make it rain
...: Make it rain rain rain"""
In [15]:
In [15]: from collections import defaultdict
In [16]: d = defaultdict(set)
In [17]: for i, line in enumerate(text.split('\n')):
...: for each_word in line.split(' '):
...: d[each_word].add(i)
...:
...:
In [18]: d
Out[18]:
defaultdict(set,
{'I': {0, 1},
'Make': {3},
'You': {2},
'gotta': {2},
'have': {0, 1},
'it': {2, 3},
'make': {2},
'no': {0, 1},
'pride': {0},
'rain': {2, 3},
'shame': {1}})
我第一次在 Python 中写东西,但这个有效:
def lineIndex(fName):
d = {}
with open(fName, 'r') as f:
content = f.readlines()
lnc = 0
result = {}
for line in content:
line = line.rstrip()
words = line.split(" ")
for word in words:
tmp = result.get(word)
if tmp is None:
result[word] = []
if lnc not in result[word]:
result[word].append(lnc)
lnc = lnc + 1
return result
print(lineIndex('index.txt'))
所以我的作业问题是lineIndex
索引文本文件中的单词和return 文本文件中每个单词的行号列表。整个输出必须 return 编入字典。
例如,文本文件中的内容如下:
I have no pride
I have no shame
You gotta make it rain
Make it rain rain rain`
我的教授希望输出如下所示:
{'rain': [2, 3], 'gotta': [2], 'make': [2], 'it': [2, 3], 'shame': [1], 'I': [0, 1], 'You': [2], 'have': [0, 1], 'no': [0, 1], 'Make': [3], 'pride': [0]}
例如:单词 'rain' 在第 2 行和第 3 行中。 (第一行总是从零开始)
到目前为止,这是我的代码,但我需要有关算法的帮助。
def lineIndex(fName):
d = {}
with open(fName, 'r') as f:
#algorithm goes here
print(lineIndex('index.txt'))
试试这个
def lineIndex(fName):
dic = {}
i=0
with open(fName, 'r') as f:
while True:
x=f.readline()
if not x:
break
i+=1
for j in x:
if j in dic:
dic[j].add(i)
else:
dic[j]=set()
dic[j].add(i)
print (dic)
print (lineIndex("index.txt"))
这里有一个使用集合的简单方法,我会给你练习如何用一个文件来做。
In [14]: text = """I have no pride
...: I have no shame
...: You gotta make it rain
...: Make it rain rain rain"""
In [15]:
In [15]: from collections import defaultdict
In [16]: d = defaultdict(set)
In [17]: for i, line in enumerate(text.split('\n')):
...: for each_word in line.split(' '):
...: d[each_word].add(i)
...:
...:
In [18]: d
Out[18]:
defaultdict(set,
{'I': {0, 1},
'Make': {3},
'You': {2},
'gotta': {2},
'have': {0, 1},
'it': {2, 3},
'make': {2},
'no': {0, 1},
'pride': {0},
'rain': {2, 3},
'shame': {1}})
我第一次在 Python 中写东西,但这个有效:
def lineIndex(fName):
d = {}
with open(fName, 'r') as f:
content = f.readlines()
lnc = 0
result = {}
for line in content:
line = line.rstrip()
words = line.split(" ")
for word in words:
tmp = result.get(word)
if tmp is None:
result[word] = []
if lnc not in result[word]:
result[word].append(lnc)
lnc = lnc + 1
return result
print(lineIndex('index.txt'))