使用 python 从一些字符串中提取文本
extracting a text from some strings with python
如果我有这些字符串:
string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
string2 = "2-176-1 Takasu Misato-shi, Saitama-ken, 341-0037 Japan"
string3 = "5-6-60 Higashikonoike-cho, Higashi-Osaka-shi, Osaka 578-0973"
我需要提取用破折号连接的三个数字中的第三个数字,输出应如下所示:
string1 Output: 7
string2 Output: 1
string3 Output: 60
我怎样才能做到这一点?
使用正则表达式。
例如:
import re
string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
string2 = "2-176-1 Takasu Misato-shi, Saitama-ken, 341-0037 Japan"
string3 = "5-6-60 Higashikonoike-cho, Higashi-Osaka-shi, Osaka 578-0973"
ptrn = re.compile(r"\d+\-\d+\-(\d+)")
for i in (string1, string2, string3):
m = ptrn.search(i)
if m:
print(m.group(1))
输出:
7
1
60
我们可以在这里使用 re.findall
:
string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
matches = re.findall(r'\b\d+-\d+-(\d+)\b', string1)
print(matches[0]) # prints 7
也许不是最好的结果,但它有效:
def function(string):
string = string.split()
for i in range(0,len(string)):
count = 0
for j in range(0,len(string[i])):
if string[i][j] == '-':
count += 1
if count == 2:
result = string[i].split('-')[2]
if result.isdigit():
return result
使用正则表达式:
import re
pattern = r'\d+\-[0-9]*\-(\d+)'
for i in (string1, string2, string3):
res = re.findall(pattern, i)
print(''.join(res))
output
# 7
# 1
# 60
如果我有这些字符串:
string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
string2 = "2-176-1 Takasu Misato-shi, Saitama-ken, 341-0037 Japan"
string3 = "5-6-60 Higashikonoike-cho, Higashi-Osaka-shi, Osaka 578-0973"
我需要提取用破折号连接的三个数字中的第三个数字,输出应如下所示:
string1 Output: 7
string2 Output: 1
string3 Output: 60
我怎样才能做到这一点?
使用正则表达式。
例如:
import re
string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
string2 = "2-176-1 Takasu Misato-shi, Saitama-ken, 341-0037 Japan"
string3 = "5-6-60 Higashikonoike-cho, Higashi-Osaka-shi, Osaka 578-0973"
ptrn = re.compile(r"\d+\-\d+\-(\d+)")
for i in (string1, string2, string3):
m = ptrn.search(i)
if m:
print(m.group(1))
输出:
7
1
60
我们可以在这里使用 re.findall
:
string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
matches = re.findall(r'\b\d+-\d+-(\d+)\b', string1)
print(matches[0]) # prints 7
也许不是最好的结果,但它有效:
def function(string):
string = string.split()
for i in range(0,len(string)):
count = 0
for j in range(0,len(string[i])):
if string[i][j] == '-':
count += 1
if count == 2:
result = string[i].split('-')[2]
if result.isdigit():
return result
使用正则表达式:
import re
pattern = r'\d+\-[0-9]*\-(\d+)'
for i in (string1, string2, string3):
res = re.findall(pattern, i)
print(''.join(res))
output
# 7
# 1
# 60