没有文件更改数据的合并提交无法显示在数据框中
merged commit who does not has file changed data could not be shown in dataframe
我使用命令 git log --all --numstat --pretty=format:'--%h--%ad--%aN--%s' > ../react_git.logs
提取了 git 日志。我也想有一个合并提交列,这样我就可以分析哪个作者合并提交最多等等。我以这种方式编码以从生成日志
中创建一个数据框
COMMIT_LOG = os.path.join(os.path.abspath(''), 'react_git.logs')
raw_df = pd.read_csv(COMMIT_LOG, sep="\u0012", header=None, names=["raw"])
commit_marker = raw_df[raw_df["raw"].str.startswith("--", na=False)]
commit_info = commit_marker['raw'].str.extract(r"^--(?P<sha>.*?)--(?P<timestamp>.*?)--(?P<author>.*?)--(?P<message>.*?)$", expand=True)
commit_info.insert(loc=2, column='date', value=pd.to_datetime(commit_info['timestamp'], utc=True))
commit_info_copy = commit_info.loc[:]
commit_info_copy['today'] = pd.to_datetime('today', utc=True)
commit_info['age'] = commit_info_copy['date'] - commit_info_copy['today']
file_stats_marker = raw_df[~raw_df.index.isin(commit_info.index)]
file_stats = file_stats_marker['raw'].str.split("\t", expand=True)
file_stats = file_stats.rename(columns={0: "insertion", 1: "deletion", 2: "filepath"})
file_stats['insertion'] = pd.to_numeric(file_stats['insertion'], errors="coerce")
file_stats['deletion'] = pd.to_numeric(file_stats['deletion'], errors="coerce")
file_stats['churn'] = file_stats['insertion'] - file_stats['deletion']
commit_data = commit_info.reindex(raw_df.index).fillna(method="ffill")
commit_data = commit_data[~commit_data.index.isin(commit_info.index)]
df = commit_data.join(file_stats)
print(df.size)
print(df.head())
如果我有这样的日志
--cae635054--Sat Jun 26 14:51:23 2021 -0400--Andrew Clark--`act`: Resolve to return value of scope function (#21759)
31 0 packages/react-reconciler/src/__tests__/ReactIsomorphicAct-test.js
1 1 packages/react-test-renderer/src/ReactTestRenderer.js
24 14 packages/react/src/ReactAct.js
--e2453e200--Fri Jun 25 15:39:46 2021 -0400--Andrew Clark--act: Add test for bypassing queueMicrotask (#21743)
50 0 packages/react-reconciler/src/__tests__/ReactIsomorphicAct-test.js
--8f03109cd--Wed Sep 11 09:51:32 2019 -0700--Brian Vaughn--Moved backend injection to the content script (#16752)
--efa780d0a--Wed Sep 11 09:51:24 2019 -0700--Brian Vaughn--Removed DT inject() script since it's no longer being used
0 24 packages/react-devtools-extensions/src/inject.js
--4290967d4--Wed Sep 11 09:34:31 2019 -0700--Brian Vaughn--Merge branch 'tt-compat' of https://github.com/onionymous/react into onionymous-tt-compat
--f09854a9e--Wed Sep 11 09:30:57 2019 -0700--Brian Vaughn--Moved inline comment.
3 5 packages/react-devtools-extensions/src/injectGlobalHook.js
然后 8f03109cd
和 4290967d4
将不会在数据框中,这对于分析查找合并提交的数量非常重要,就像我上面说的那样。
如何将这些数据放入数据框中,其中插入:0、删除:0 和文件路径:0 以及一列或以任何方式将它们与其他数据区分开来,以便更容易知道它与合并提交相关?
我在 repl 上也有这个以及日志文件
https://replit.com/@milanregmi/metricsLogs#main.py
我理解你的问题有两个方面:
- 如何从 git 日志中获取有关合并提交的信息
- 如何计算每个用户的合并数并将其写入数据框
首先,您可以为 git log
添加 pretty
的 %b
格式化程序。这将为您提供提交主体,其中包含顶部的一行,表明该提交是合并提交。像这样的格式字符串
git log --all --numstat --pretty=format:'--%h--%ad--%aN--%s--%b'
然后会产生类似这样的结果
--364d23ac--Wed Jul 14 13:44:55 2021 +0200--Doe, John--Pull request #114: Bugfix/foo-branch--Merge in <repository> from bugfix/foo-branch to dev
* commit '64ee15b12345670d1ec214cb83468cf0a55a341':
Bugfix: lorem ipsum
Fix: dolor sit amet
您可以在 commit_marker
数据框中查找 Merge
来识别合并提交。其次,您可以计算每个作者的合并,为所有单个作者及其合并提交制作一个逆累积和。
全部都在这里:
import os
import pandas as pd
COMMIT_LOG = os.path.join(os.path.abspath(''), 'react_git.logs')
raw_df = pd.read_csv(COMMIT_LOG, sep="\u0012", header=None, names=["raw"])
commit_marker = raw_df[raw_df["raw"].str.startswith("--", na=False)]
# Add extraction for the new body part
commit_info = commit_marker['raw'].str.extract(r"^--(?P<sha>.*?)--(?P<timestamp>.*?)--(?P<author>.*?)--("r"?P<message>.*?)--(?P<body>.*?)$", expand=True)
commit_info.insert(loc=2, column='date', value=pd.to_datetime(commit_info['timestamp'], utc=True))
commit_info_copy = commit_info.loc[:]
commit_info_copy['today'] = pd.to_datetime('today', utc=True)
commit_info['age'] = commit_info_copy['date'] - commit_info_copy['today']
file_stats_marker = raw_df[~raw_df.index.isin(commit_info.index)]
file_stats = file_stats_marker['raw'].str.split("\t", expand=True)
file_stats = file_stats.rename(columns={0: "insertion", 1: "deletion", 2: "filepath"})
file_stats['insertion'] = pd.to_numeric(file_stats['insertion'], errors="coerce")
file_stats['deletion'] = pd.to_numeric(file_stats['deletion'], errors="coerce")
file_stats['churn'] = file_stats['insertion'] - file_stats['deletion']
commit_data = commit_info.reindex(raw_df.index).fillna(method="ffill")
commit_data = commit_data[~commit_data.index.isin(commit_info.index)]
df = commit_data.join(file_stats)
# Remove additional lines coming from the git commit body.
df.drop_duplicates(inplace=True)
# count merges per author
for author in df['author'].unique():
idx = df[(df['author'] == author) & (df['body'].str.contains('Merge'))].index
df.loc[idx, 'merges'] = list(
range(1, len(df[(df['author'] == author) & (df['body'].str.contains('Merge'))]) + 1)[::-1])
print(df.size)
print(df.head())
你最终得到的是你之前的数据框加上一列 merges
越来越多地计算每个作者的合并次数
sha timestamp date author message body age insertion deletion filepath churn merges
b3e5eb7 Fri Jul 16 11:36:43 2021 +0200 2021-07-16 09:36:43+00:00 Doe, John test deploy -4 days +00:51:00.878903000 31.0 3.0 file/path/file1.py 28.0
4fc0c34 Thu Jul 15 11:12:10 2021 +0200 2021-07-15 09:12:10+00:00 Cow, Jane Pull request #116: Dev Merge in repo from dev to master -5 days +00:26:27.878903000 14.0
8188751 Thu Jul 15 07:42:40 2021 +0200 2021-07-15 05:42:40+00:00 Doe, John Pull request #115: Feature/foo-bar Merge in repo from feature/foo-bar to dev -6 days +20:56:57.878903000 7.0
6fa89c3 Wed Jul 14 16:02:38 2021 +0200 2021-07-14 14:02:38+00:00 Cow, Jane Added: foo bar -6 days +05:16:55.878903000 4056.0 0.0 file/path/file2.py 4056.0
我使用命令 git log --all --numstat --pretty=format:'--%h--%ad--%aN--%s' > ../react_git.logs
提取了 git 日志。我也想有一个合并提交列,这样我就可以分析哪个作者合并提交最多等等。我以这种方式编码以从生成日志
COMMIT_LOG = os.path.join(os.path.abspath(''), 'react_git.logs')
raw_df = pd.read_csv(COMMIT_LOG, sep="\u0012", header=None, names=["raw"])
commit_marker = raw_df[raw_df["raw"].str.startswith("--", na=False)]
commit_info = commit_marker['raw'].str.extract(r"^--(?P<sha>.*?)--(?P<timestamp>.*?)--(?P<author>.*?)--(?P<message>.*?)$", expand=True)
commit_info.insert(loc=2, column='date', value=pd.to_datetime(commit_info['timestamp'], utc=True))
commit_info_copy = commit_info.loc[:]
commit_info_copy['today'] = pd.to_datetime('today', utc=True)
commit_info['age'] = commit_info_copy['date'] - commit_info_copy['today']
file_stats_marker = raw_df[~raw_df.index.isin(commit_info.index)]
file_stats = file_stats_marker['raw'].str.split("\t", expand=True)
file_stats = file_stats.rename(columns={0: "insertion", 1: "deletion", 2: "filepath"})
file_stats['insertion'] = pd.to_numeric(file_stats['insertion'], errors="coerce")
file_stats['deletion'] = pd.to_numeric(file_stats['deletion'], errors="coerce")
file_stats['churn'] = file_stats['insertion'] - file_stats['deletion']
commit_data = commit_info.reindex(raw_df.index).fillna(method="ffill")
commit_data = commit_data[~commit_data.index.isin(commit_info.index)]
df = commit_data.join(file_stats)
print(df.size)
print(df.head())
如果我有这样的日志
--cae635054--Sat Jun 26 14:51:23 2021 -0400--Andrew Clark--`act`: Resolve to return value of scope function (#21759)
31 0 packages/react-reconciler/src/__tests__/ReactIsomorphicAct-test.js
1 1 packages/react-test-renderer/src/ReactTestRenderer.js
24 14 packages/react/src/ReactAct.js
--e2453e200--Fri Jun 25 15:39:46 2021 -0400--Andrew Clark--act: Add test for bypassing queueMicrotask (#21743)
50 0 packages/react-reconciler/src/__tests__/ReactIsomorphicAct-test.js
--8f03109cd--Wed Sep 11 09:51:32 2019 -0700--Brian Vaughn--Moved backend injection to the content script (#16752)
--efa780d0a--Wed Sep 11 09:51:24 2019 -0700--Brian Vaughn--Removed DT inject() script since it's no longer being used
0 24 packages/react-devtools-extensions/src/inject.js
--4290967d4--Wed Sep 11 09:34:31 2019 -0700--Brian Vaughn--Merge branch 'tt-compat' of https://github.com/onionymous/react into onionymous-tt-compat
--f09854a9e--Wed Sep 11 09:30:57 2019 -0700--Brian Vaughn--Moved inline comment.
3 5 packages/react-devtools-extensions/src/injectGlobalHook.js
然后 8f03109cd
和 4290967d4
将不会在数据框中,这对于分析查找合并提交的数量非常重要,就像我上面说的那样。
如何将这些数据放入数据框中,其中插入:0、删除:0 和文件路径:0 以及一列或以任何方式将它们与其他数据区分开来,以便更容易知道它与合并提交相关?
我在 repl 上也有这个以及日志文件 https://replit.com/@milanregmi/metricsLogs#main.py
我理解你的问题有两个方面:
- 如何从 git 日志中获取有关合并提交的信息
- 如何计算每个用户的合并数并将其写入数据框
首先,您可以为 git log
添加 pretty
的 %b
格式化程序。这将为您提供提交主体,其中包含顶部的一行,表明该提交是合并提交。像这样的格式字符串
git log --all --numstat --pretty=format:'--%h--%ad--%aN--%s--%b'
然后会产生类似这样的结果
--364d23ac--Wed Jul 14 13:44:55 2021 +0200--Doe, John--Pull request #114: Bugfix/foo-branch--Merge in <repository> from bugfix/foo-branch to dev
* commit '64ee15b12345670d1ec214cb83468cf0a55a341':
Bugfix: lorem ipsum
Fix: dolor sit amet
您可以在 commit_marker
数据框中查找 Merge
来识别合并提交。其次,您可以计算每个作者的合并,为所有单个作者及其合并提交制作一个逆累积和。
全部都在这里:
import os
import pandas as pd
COMMIT_LOG = os.path.join(os.path.abspath(''), 'react_git.logs')
raw_df = pd.read_csv(COMMIT_LOG, sep="\u0012", header=None, names=["raw"])
commit_marker = raw_df[raw_df["raw"].str.startswith("--", na=False)]
# Add extraction for the new body part
commit_info = commit_marker['raw'].str.extract(r"^--(?P<sha>.*?)--(?P<timestamp>.*?)--(?P<author>.*?)--("r"?P<message>.*?)--(?P<body>.*?)$", expand=True)
commit_info.insert(loc=2, column='date', value=pd.to_datetime(commit_info['timestamp'], utc=True))
commit_info_copy = commit_info.loc[:]
commit_info_copy['today'] = pd.to_datetime('today', utc=True)
commit_info['age'] = commit_info_copy['date'] - commit_info_copy['today']
file_stats_marker = raw_df[~raw_df.index.isin(commit_info.index)]
file_stats = file_stats_marker['raw'].str.split("\t", expand=True)
file_stats = file_stats.rename(columns={0: "insertion", 1: "deletion", 2: "filepath"})
file_stats['insertion'] = pd.to_numeric(file_stats['insertion'], errors="coerce")
file_stats['deletion'] = pd.to_numeric(file_stats['deletion'], errors="coerce")
file_stats['churn'] = file_stats['insertion'] - file_stats['deletion']
commit_data = commit_info.reindex(raw_df.index).fillna(method="ffill")
commit_data = commit_data[~commit_data.index.isin(commit_info.index)]
df = commit_data.join(file_stats)
# Remove additional lines coming from the git commit body.
df.drop_duplicates(inplace=True)
# count merges per author
for author in df['author'].unique():
idx = df[(df['author'] == author) & (df['body'].str.contains('Merge'))].index
df.loc[idx, 'merges'] = list(
range(1, len(df[(df['author'] == author) & (df['body'].str.contains('Merge'))]) + 1)[::-1])
print(df.size)
print(df.head())
你最终得到的是你之前的数据框加上一列 merges
越来越多地计算每个作者的合并次数
sha timestamp date author message body age insertion deletion filepath churn merges
b3e5eb7 Fri Jul 16 11:36:43 2021 +0200 2021-07-16 09:36:43+00:00 Doe, John test deploy -4 days +00:51:00.878903000 31.0 3.0 file/path/file1.py 28.0
4fc0c34 Thu Jul 15 11:12:10 2021 +0200 2021-07-15 09:12:10+00:00 Cow, Jane Pull request #116: Dev Merge in repo from dev to master -5 days +00:26:27.878903000 14.0
8188751 Thu Jul 15 07:42:40 2021 +0200 2021-07-15 05:42:40+00:00 Doe, John Pull request #115: Feature/foo-bar Merge in repo from feature/foo-bar to dev -6 days +20:56:57.878903000 7.0
6fa89c3 Wed Jul 14 16:02:38 2021 +0200 2021-07-14 14:02:38+00:00 Cow, Jane Added: foo bar -6 days +05:16:55.878903000 4056.0 0.0 file/path/file2.py 4056.0