从 CSV 文件计算 R 组合

Question

我有一个 CSV 文件，其中包含大约 400 个值，范围从 10 000 到 50 000。

我想计算所选值的哪些组合，例如 100、150、200,250 对应于 CSV 文件中的值。

可以用 R 实现吗？

所以这是部分数据：

1359.214844
1604.558594
1701.759766
1761.083984
1792.990234
1926.248047
1958.144531
2086.373047
2114.501953
2142.542969
2204.325621
2216.468750
2229.136719
2286.894531
2302.847656
2379.826172
2395.039063
2467.578125
2610.802734
2797.929688
2812.916016
2838.947266
2979.498047
3122.171875
3163.671875
3457.794922
3809.228516
3826.058594
3952.609375
3983.210938
4102.996094

第二组数据为(146.058, 203.193, 162.053, 291.095) 我需要与第一个中的值相对应的第二个数据集的可能组合。例如291*2+162*5+203*4 = 2204.

Answer 1

会有其他方法可以做到这一点，比如在迭代时检查特定组合的循环 i 并决定保留或忽略它，但我宁愿尽可能不使用循环。

library(dplyr)

dt = read.table(text = "1359.214844
                1604.558594
                1701.759766
                1761.083984
                1792.990234
                1926.248047
                1958.144531
                2086.373047
                2114.501953
                2142.542969
                2204.325621
                2216.468750
                2229.136719
                2286.894531
                2302.847656
                2379.826172
                2395.039063
                2467.578125
                2610.802734
                2797.929688
                2812.916016
                2838.947266
                2979.498047
                3122.171875
                3163.671875
                3457.794922
                3809.228516
                3826.058594
                3952.609375
                3983.210938
                4102.996094")

# change column name and round values
names(dt) = "value"
dt$value = round(dt$value)

# give the manual values (assuming they are 4 values)
manual_values = c(146.058, 203.193, 162.053, 291.095)

# round values
manual_values = round(manual_values)


# get the maximum coefficient to investigate
coeff = ceiling(max(dt$value) / min(manual_values))


expand.grid(v1 = manual_values[1],  ## create all combinations of coefficients and keep your values
            v2 = manual_values[2],
            v3 = manual_values[3],
            v4 = manual_values[4],
            coeff1 = 0:coeff,
            coeff2 = 0:coeff,
            coeff3 = 0:coeff,
            coeff4 = 0:coeff) %>%
  mutate(value = v1*coeff1+v2*coeff2+v3*coeff3+v4*coeff4) %>%  ## calculate the value from each combination
  inner_join(dt, by="value")  ## join info from your initial values


## sample of the first 10 rows of the result :

#      v1  v2  v3  v4 coeff1 coeff2 coeff3 coeff4 value
# 1   146 203 162 291      3     10      0      0  2468
# 2   146 203 162 291      7     12      0      0  3458
# 3   146 203 162 291      9     13      0      0  3953
# 4   146 203 162 291      7      3      1      0  1793
# 5   146 203 162 291     22      3      1      0  3983
# 6   146 203 162 291     15      4      1      0  3164
# 7   146 203 162 291      4      5      1      0  1761
# 8   146 203 162 291      0     11      1      0  2395
# 9   146 203 162 291      4     11      1      0  2979
# 10  146 203 162 291      2     14      2      0  3458

因此，输出的第一行告诉您组合 3*146 + 10*203 等于 2468，这是一个存在于您的初始数据集 (CSV) 中的值。

如果您发现任何错误，或者需要任何说明，请告诉我，我可以更新我的答案。

可以将最后的 inner_join 替换为 filter(value %in% dt$value)，这是一个小的改进。当您可以使用过滤命令获得相同的输出时，我认为没有任何理由加入。

对于您的其他 objective（在评论中指定）试试这个：

library(dplyr)

dt = read.table(text = "1359.214844
                1604.558594
                1701.759766
                1761.083984
                1792.990234
                1926.248047
                1958.144531
                2086.373047
                2114.501953
                2142.542969
                2204.325621
                2216.468750
                2229.136719
                2286.894531
                2302.847656
                2379.826172
                2395.039063
                2467.578125
                2610.802734
                2797.929688
                2812.916016
                2838.947266
                2979.498047
                3122.171875
                3163.671875
                3457.794922
                3809.228516
                3826.058594
                3952.609375
                3983.210938
                4102.996094")

# change column name and round values
names(dt) = "value"
dt$value = round(dt$value)

# give the manual values (assuming they are 4 values)
manual_values = c(146.058, 203.193, 162.053, 291.095)

# get the maximum coefficient to investigate
coeff = ceiling(max(dt$value) / min(manual_values))


expand.grid(v1 = manual_values[1],  ## create all combinations of coefficients and keep your values
            v2 = manual_values[2],
            v3 = manual_values[3],
            v4 = manual_values[4],
            coeff1 = 0:3,
            coeff2 = 5:coeff,
            coeff3 = 5:coeff,
            coeff4 = 0:3) %>%
  mutate(SUM = v1*coeff1+v2*coeff2+v3*coeff3+v4*coeff4) %>%  ## calculate the value of each combination
  tbl_df()                          ## only for printing top 10 rows


#         v1      v2      v3      v4 coeff1 coeff2 coeff3 coeff4      SUM
#      (dbl)   (dbl)   (dbl)   (dbl)  (int)  (int)  (int)  (int)    (dbl)
# 1  146.058 203.193 162.053 291.095      0      5      5      0 1826.230
# 2  146.058 203.193 162.053 291.095      1      5      5      0 1972.288
# 3  146.058 203.193 162.053 291.095      2      5      5      0 2118.346
# 4  146.058 203.193 162.053 291.095      3      5      5      0 2264.404
# 5  146.058 203.193 162.053 291.095      0      6      5      0 2029.423
# 6  146.058 203.193 162.053 291.095      1      6      5      0 2175.481
# 7  146.058 203.193 162.053 291.095      2      6      5      0 2321.539
# 8  146.058 203.193 162.053 291.095      3      6      5      0 2467.597
# 9  146.058 203.193 162.053 291.095      0      7      5      0 2232.616
# 10 146.058 203.193 162.053 291.095      1      7      5      0 2378.674
# ..     ...     ...     ...     ...    ...    ...    ...    ...      ...

您可以将此结果 table 保存为数据框，然后根据需要继续您的过程。

从 CSV 文件计算 R 组合

Calculating R combinations from CSV file

r

combinatorics