使用 python 从一些字符串中提取文本

extracting a text from some strings with python

如果我有这些字符串:

string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
string2 = "2-176-1 Takasu Misato-shi, Saitama-ken, 341-0037 Japan"
string3 = "5-6-60 Higashikonoike-cho, Higashi-Osaka-shi, Osaka 578-0973"

我需要提取用破折号连接的三个数字中的第三个数字,输出应如下所示:

string1 Output: 7
string2 Output: 1
string3 Output: 60

我怎样才能做到这一点?

使用正则表达式。

例如:

import  re

string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
string2 = "2-176-1 Takasu Misato-shi, Saitama-ken, 341-0037 Japan"
string3 = "5-6-60 Higashikonoike-cho, Higashi-Osaka-shi, Osaka 578-0973"

ptrn = re.compile(r"\d+\-\d+\-(\d+)")
for i in (string1, string2, string3):
    m = ptrn.search(i)
    if m:
        print(m.group(1))

输出:

7
1
60

我们可以在这里使用 re.findall:

string1 = "10th floor, Shindaisou Building, 2-10-7 Dogenzaka, Shibuya-ku , Tokyo 150-0043"
matches = re.findall(r'\b\d+-\d+-(\d+)\b', string1)
print(matches[0])  # prints 7

也许不是最好的结果,但它有效:

def function(string):
    string = string.split()
    for i in range(0,len(string)):
        count = 0
        for j in range(0,len(string[i])):
            if string[i][j] == '-':
                count += 1
        if count == 2:
            result = string[i].split('-')[2]
            if result.isdigit():
                return result

使用正则表达式:

import re
pattern = r'\d+\-[0-9]*\-(\d+)'
for i in (string1, string2, string3):
    res = re.findall(pattern, i)
    print(''.join(res))

    output
    # 7
    # 1
    # 60