如何读取 Github 存储库中的所有文本文件?
How to read all text files in Github repository?
我想读取Github 存储库中的所有文本文件,但是文本文件地址与原始文本地址不同。
Trump Speeches
例如看这个 link:
speech_00.txt in first status
现在,speech_00.txt 与原始模式下的地址不同
speech_00.txt in raw status
如果不编辑地址(例如添加
githubusercontent 或删除 blob)
此外,我使用以下代码读取了示例文本文件:
import urllib
response = urllib.request.urlopen("https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/speech_72.txt")
Text = response.read()
Text=Text.decode("utf-8")
实现它的一种有点古怪的方式(基于特别是该目录的方式
structured) 将循环迭代地添加到您正在输入的字符串
作为你的文件路径:
import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 00
end_speech = 73
while (cur_speech <= end_speech):
# Change the speech you want to get
speech_nm = ('speech_' + str(cur_speech) +'.txt')
response = urllib.request.urlopen(speech_nm)
# Do what you need to with the speech
Text = response.read()
Text = Text.decode("utf-8")
# Update to the new speech
cur_speech +=1
这样,您将浏览该特定目录中的每个演讲。
我使用你的代码(@N.Yasarturk),我编辑它以获取所有文件。但我问,是否有其他方法(无需编辑地址)从 Github 存储库读取这些文件?
import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 0
temp=str(cur_speech)
end_speech = 73
while (cur_speech <= end_speech):
# Change the speech you want to get
if(cur_speech<10):
temp="0"+str(cur_speech)
else:
temp=str(cur_speech)
speech_nm = (speech_dir+'speech_' + temp +'.txt')
print(speech_nm)
response = urllib.request.urlopen(speech_nm)
# Do what you need to with the speech
Text = response.read()
Text = Text.decode("utf-8")
print(Text)
# Update to the new speech
cur_speech +=1
我想读取Github 存储库中的所有文本文件,但是文本文件地址与原始文本地址不同。 Trump Speeches
例如看这个 link: speech_00.txt in first status
现在,speech_00.txt 与原始模式下的地址不同 speech_00.txt in raw status
如果不编辑地址(例如添加 githubusercontent 或删除 blob)
此外,我使用以下代码读取了示例文本文件:
import urllib
response = urllib.request.urlopen("https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/speech_72.txt")
Text = response.read()
Text=Text.decode("utf-8")
实现它的一种有点古怪的方式(基于特别是该目录的方式 structured) 将循环迭代地添加到您正在输入的字符串 作为你的文件路径:
import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 00
end_speech = 73
while (cur_speech <= end_speech):
# Change the speech you want to get
speech_nm = ('speech_' + str(cur_speech) +'.txt')
response = urllib.request.urlopen(speech_nm)
# Do what you need to with the speech
Text = response.read()
Text = Text.decode("utf-8")
# Update to the new speech
cur_speech +=1
这样,您将浏览该特定目录中的每个演讲。
我使用你的代码(@N.Yasarturk),我编辑它以获取所有文件。但我问,是否有其他方法(无需编辑地址)从 Github 存储库读取这些文件?
import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 0
temp=str(cur_speech)
end_speech = 73
while (cur_speech <= end_speech):
# Change the speech you want to get
if(cur_speech<10):
temp="0"+str(cur_speech)
else:
temp=str(cur_speech)
speech_nm = (speech_dir+'speech_' + temp +'.txt')
print(speech_nm)
response = urllib.request.urlopen(speech_nm)
# Do what you need to with the speech
Text = response.read()
Text = Text.decode("utf-8")
print(Text)
# Update to the new speech
cur_speech +=1