如何迭代附加到文本?

How do I iteratively append to text?

我有一个数据框需要附加到 page/9/ in python

df:

/soccer/england/premier-league-2020-2021/results/
/soccer/england/premier-league-2019-2020/results/
/soccer/england/premier-league-2018-2019/results/

对于 df 中的每一行,我必须将 page/#/page/2/page/3/page/4/ 等附加到 page/9/如下

如何在 python 中完成?

预期 df:

/soccer/england/premier-league-2020-2021/results/#/
/soccer/england/premier-league-2020-2021/results/#/page/2/
/soccer/england/premier-league-2020-2021/results/#/page/3/
.
.
/soccer/england/premier-league-2020-2021/results/#/page/9/
/soccer/england/premier-league-2019-2020/results/#/
/soccer/england/premier-league-2019-2020/results/#/page/2/
/soccer/england/premier-league-2019-2020/results/#/page/3/
.
.
/soccer/england/premier-league-2019-2020/results/#/page/9/
/soccer/england/premier-league-2018-2019/results/#/
/soccer/england/premier-league-2018-2019/results/#/page/2/
/soccer/england/premier-league-2018-2019/results/#/page/3/
.
.
/soccer/england/premier-league-2018-2019/results/#/page/9/

您可以 运行 一个简单的循环:

import pandas as pd

df = pd.read_csv('data.csv')
liks_with_pages = []
for lid,link in enumerate(df['Duration'].tolist()):
  page_num = lid%9 + 1
  if page_num == 1:
    suffix = '#/'
  else:
    suffix = '#/page/' + str(page_num) + '/'
  liks_with_pages.append(str(link)+suffix)

processed = dict()
for URL in URLS:
    if URL not in processed:
        processed[URL] = 1
        print(url)
    else:
        processed[URL] = processed[URL]+1
        print(url+f'/page/{processed[URL]}')
year_results = [
    "/soccer/england/premier-league-2020-2021/results",
    "/soccer/england/premier-league-2019-2020/results",
    "/soccer/england/premier-league-2018-2019/results",
]
sub_pages = []
for year_res in year_results:
    for page in range(1, 10):
        if page != 1:
            page = f"page/{page}/"
        else:
            page = ''
        sub_pages.append(f"{year_res}/#/{page}")

或等同于:

sub_pages = [f"{year_res}/#/{f'page/{page}/' if page!=1 else ''}" for year_res in year_results for page in range(1, 10)]

我使用的示例数据框:

df=pd.DataFrame({'col': {0: '/soccer/england/premier-league-2020-2021/results/',
  1: '/soccer/england/premier-league-2019-2020/results/',
  2: '/soccer/england/premier-league-2018-2019/results/',
  3: '/soccer/england/premier-league-2020-2021/results/',
  4: '/soccer/england/premier-league-2019-2020/results/',
  5: '/soccer/england/premier-league-2018-2019/results/',
  6: '/soccer/england/premier-league-2020-2021/results/',
  7: '/soccer/england/premier-league-2019-2020/results/',
  8: '/soccer/england/premier-league-2018-2019/results/',
  9: '/soccer/england/premier-league-2020-2021/results/',
  10: '/soccer/england/premier-league-2019-2020/results/',
  11: '/soccer/england/premier-league-2018-2019/results/'}})

你可以试试:

df['h']=df.index%9+1
#created a helper column
df['col']=df['col']+("#/page/"+df['h'].astype(str)+'/').mask(df['h'].eq(1),"#/")
#conditionally adding '"/#/page/pagenumber/"' and '#/'
df=df.drop('h',1)
#remove that helper column

现在如果你打印 df 你会得到你想要的输出

更新:

IIUC 每个唯一 url 需要 9 url 所以:

out=pd.DataFrame(df['col'].unique(),columns=['col'])
#created a dataframe from the unique values of 'col' column
out=out.reindex(out.index.repeat(9)).reset_index(drop=True)
#repeated values of each row 9 times
out['h']=out.index%9+1
#created a helper column
out['col']=out['col']+("#/page/"+out['h'].astype(str)+'/').mask(out['h'].eq(1),"#/")
#conditionally adding '"/#/page/pagenumber/"' and '#/'
out=out.drop('h',1)
#remove that helper column

现在如果你打印 out 你会得到你想要的输出