在将列表转换为数据框或数据时处理缺失信息 table
Dealing with missing information while converting a list into data frame or data table
与 相关,是否有任何方法可以将其中一些名称重复的命名元素列表转换为数据 table,其中 NA 值实际显示在数据中 table 按照它们在列表中出现的顺序?
例如:列表
testlist <- list("Blue", "405", "Truck", "400", "Car", "White", "500", "Truck")
testnames <- c("Color", "HP", "Type", "HP", "Type", "Color", "HP", "Type")
names(testlist) <- testnames
$Color
[1] "Blue"
$HP
[1] "405"
$Type
[1] "Truck"
$HP
[1] "400"
$Type
[1] "Car"
$Color
[1] "White"
$HP
[1] "500"
$Type
[1] "Truck"
可以更改为数据 table 使用:
dcast(setDT(melt(testlist))[, N:=1:.N, L1], N~L1, value.var='value')
但输出是这样的:
N Color HP Type
1 1 Blue 405 Truck
2 2 White 400 Car
3 3 <NA> 500 Truck
当我想要的时候:
N Color HP Type
1 1 Blue 405 Truck
2 2 <NA> 400 Car
3 3 White 500 Truck
有人对如何解决这个问题有建议吗?感谢您的帮助。
可能不是最佳解决方案,因为它使用了 while 循环。但是,使用 tidyr
或您最喜欢的其他整形包。
testlist <- c("Blue", "405", "Truck", "400", "Car", "White", "500", "Truck")
testnames <- c("Color", "HP", "Type", "HP", "Type", "Color", "HP", "Type")
df <- data.frame(names = testnames, attributes = testlist, stringsAsFactors = FALSE)
# need to count number of vehicles inside data frame
# initialise while loop counters
df_index = 1
vehicle_index = vector(mode = "integer", length = nrow(df))
vehicle_count = 1
# now loop through the data frame to find attributes
# which belong to vehicle 1, 2, 3, etc...
while(df_index <= nrow(df)){
if (sum(c("Color", "HP", "Type") == df$names[df_index:(df_index+2)]) == 3) {
vehicle_index[df_index:(df_index+2)] <- vehicle_count
df_index = df_index + 3
vehicle_count = vehicle_count + 1
} else if (sum(c("Color", "HP", "Type") %in% df$names[df_index:(df_index+1)]) == 2) {
vehicle_index[df_index:(df_index+1)] <- vehicle_count
df_index = df_index + 2
vehicle_count = vehicle_count + 1
} else {
vehicle_index[df_index:(df_index)] <- vehicle_count
df_index = df_index + 1
vehicle_count = vehicle_count + 1
}
}
# finally, label the vehicle attributes with the vehicle number,
# and spread the data.
df_final <- data.frame(df, vehicle_index = vehicle_index)
tidyr::spread(df_final, key = "names", value = "attributes")
一种方法是用正确的行数和正确的列数、名称和类型预分配 table,然后通过索引分配原始列表覆盖的单元格来填充它.
cns <- c('Color','HP','Type');
lcis <- match(names(testlist),cns);
lris <- c(1L,cumsum(diff(lcis)<=0L)+1L);
df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsAsFactors=F)[0,];
df[max(lris),] <- NA;
df;
## Color HP Type
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
for (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); };
df;
## Color HP Type
## 1 Blue 405 Truck
## 2 <NA> 400 Car
## 3 White 500 Truck
在我的解决方案中,我小心翼翼地分别处理每一列,如果输出中的不同列 table(对应于输入列表中不同的组件子集)具有不同的数据类型,这提供了潜在的好处,那么这些数据类型将被保留在最后的 table 中。这就是我为索引分配选择 for
循环的原因。对于只有字符类型的精确输入列表,这当然不是必需的,但无论如何我认为这是一个值得的目标。
中间变量的解释
cns
输出中的列名 table.
lcis
每个输入列表组件将在输出中具有的列索引 table。这是通过简单地将输入列表组件的名称与 cns
. 进行匹配来计算的
lris
每个输入列表组件将在输出中具有的行索引 table。这个变量的计算有点有趣并且是解决方案的核心。由于输入列表中的列表示不完整(IOW 在输入列表中可以有 "missing columns"),但您认为输入列表组件是根据它们在输出中的按行出现来排序的 [=96= 】,我们不能使用常规索引(比如将每三个组件作为一行),我们也不能使用任何单个列名作为每一行的标记,因为任何列都可以在任何行中丢失。根据我的想法,唯一正确的方法是确定输入列表中的低索引(或实际上是等索引)列何时紧接在高索引(或等索引)列之后出现,并将其作为换行符.因此,我们可以取 diff(lcis)<=0L
得到一个表示换行的逻辑向量,取 cumsum()
加 1 得到行索引,我们还必须手动添加 1 来完成向量。
ci
输出中的列索引 table。在 for
循环期间用于迭代每个输出列。
m
为 for
循环中的每个 ci
计算。一个逻辑向量,表示哪些输入列表组件属于当前列 ci
。用于索引 lris
(提取行索引进行分配)和输入列表本身(提取实际值进行分配)。
实际数据
我从 dropbox 中抓取了你的真实数据并将其存储为 testlist
。以下是我的调查结果。
首先,我按照出现的顺序检查了唯一的组件名称,将它们设为 cns
:
## first reasonable assumption about cns
cns <- unique(names(testlist));
cns;
## [1] "Status" "Make" "Model"
## [4] "Kilometres" "Stock Number" "Engine"
## [7] "Number of Hours" "Front axle" "Rear axle"
## [10] "Suspension" "Wheelbase" "Transmission"
## [13] "Price" "Style/Trim" "Brakes"
## [16] "Mfg Exterior Colour" "Tires" "Engine (HP)"
## [19] "Exterior Colour"
从中我们可以计算出一个新的暂定 lcis
:
## examine lcis for ordering
lcis <- match(names(testlist),cns);
lcis;
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12
## [26] 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11
## [51] 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10
## [76] 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9
## [101] 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8
## [126] 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7
## [151] 8 9 10 11 12 13 1 2 3 4 14 13 1 2 3 4 5 6 7 8 9 10 11 12 13
## [176] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13 1 2 3 4 5 15 16 6
## [201] 8 9 10 17 11 18 12 19 13
仔细观察上面的向量,我们可以看到它以1:13
的许多规则重复开始。事实上,只有在向量的末尾,它才变得不规则,我们看到 14 后跟 13,16 后跟 6,10-11-12 与 17-18-19 交错,等等
但我们在这里可以做的一个重要观察是,向量似乎由 1 和 13 划定的组组成。换句话说,对于似乎具有某种规律性的所有范围(即使也存在一些不规则性) ,它们似乎以 1 开头,以 13 结尾。这一观察结果与您关于车辆数据中间无序的评论一致。让我们称之为 1/13 假设。
我们可以通过在这个 1/13 边界上拆分来更清楚地了解组:
## recognizing 1/13 consistency, split on it to see how each (possible) row looks under this assumption
split(lcis,cumsum(lcis==1L));
## $`1`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`2`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`3`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`4`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`5`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`6`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`7`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`8`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`9`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`10`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`11`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`12`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`13`
## [1] 1 2 3 4 14 13
##
## $`14`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`15`
## [1] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13
##
## $`16`
## [1] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13
现在,如果您 非常 仔细查看以上组,您会发现可以重新排序 cns
,这样所有组都会升序排列。它们不会是连续的,但我为原始问题设计的解决方案不需要连续;所有必要的是升序。
例如,我们需要将第 14 列排在第 13 列之前,我们需要将第 15 和 16 列排在第 6、8、9 等列之前:
## recognizing the possibility of reordering to achieve perfect within-row ascending order, reorder cns to cns2
cns2 <- cns[c(1,2,3,4,14,5,15,16,6,7,8,9,10,17,11,18,12,19,13)];
cns2;
## [1] "Status" "Make" "Model"
## [4] "Kilometres" "Style/Trim" "Stock Number"
## [7] "Brakes" "Mfg Exterior Colour" "Engine"
## [10] "Number of Hours" "Front axle" "Rear axle"
## [13] "Suspension" "Tires" "Wheelbase"
## [16] "Engine (HP)" "Transmission" "Exterior Colour"
## [19] "Price"
现在我们可以重新计算 lcis
,我现在将其称为 lcis2
,并演示新的组订单:
## calculate lcis2 from cns2, and prove that we've successfully ordered each individual row under the 1/13 (now 1/19) break assumption
lcis2 <- match(names(testlist),cns2);
split(lcis2,cumsum(lcis2==1L));
## $`1`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`2`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`3`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`4`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`5`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`6`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`7`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`8`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`9`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`10`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`11`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`12`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`13`
## [1] 1 2 3 4 5 19
##
## $`14`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`15`
## [1] 1 2 3 4 6 7 8 9 11 12 13 14 15 16 17 18 19
##
## $`16`
## [1] 1 2 3 4 6 7 8 9 11 12 13 14 15 16 17 18 19
最后,我们可以 运行 整个解决方案,现在要小心使用 2 后缀的变量名:
## now we can apply the preallocate/fill-in solution using cns2 and lcis2
## will use lris2 and df2 just to be consistent
lris2 <- c(1L,cumsum(diff(lcis2)<=0L)+1L);
df2 <- as.data.frame(testlist[match(1:length(cns2),lcis2)],stringsAsFactors=F)[0,];
df2[max(lris2),] <- NA;
df2;
## Status Make Model Kilometres Style.Trim Stock.Number Brakes Mfg.Exterior.Colour Engine Number.of.Hours Front.axle Rear.axle Suspension Tires Wheelbase Engine..HP. Transmission Exterior.Colour Price
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 7 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 11 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 12 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 13 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 14 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 15 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
for (ci in 1:length(cns2)) { m <- lcis2==ci; df2[lris2[m],ci] <- do.call(c,testlist[m]); };
df2;
## Status Make Model Kilometres Style.Trim Stock.Number Brakes Mfg.Exterior.Colour Engine Number.of.Hours Front.axle Rear.axle Suspension Tires Wheelbase Engine..HP. Transmission Exterior.Colour Price
## 1 New Peterbilt 367 Tri-Drive c/w 58'' Sleeper 3,360 km <NA> 12949 <NA> <NA> Cummins ISX15 (550 hp) 44 Dana Spicer D2000 (20,000lb) Dana T69-170 (wide track) t Peterbilt Air-Trak (66,000lb) <NA> 267'' <NA> RTLO18918B Fuller (18 speed) <NA> 7,770
## 2 New Kenworth T800 T/A Tractor 82,230 km <NA> 10720 <NA> <NA> Cummins ISX15 (550hp) 2,712 Dana Spicer D2000 (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252 (52,000lb) Air <NA> 244'' <NA> Fuller 18 spd main AT1202 2 sp <NA> 9,500
## 3 New Kenworth T800 Tandem Tractor w/ 38'' Sleeper 98,521 km <NA> 10722 <NA> <NA> Cummins ISX15 (550hp) 2,790 Dana Spicer D2000 (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252 (52,000lb) Air <NA> 244'' <NA> Fuller 18 spd main AT1202 2 sp <NA> 9,500
## 4 Used Kenworth W900 Tri-Drive Sleeper Truck Tractor 170,422 km <NA> 13227 <NA> <NA> Cummins ISX15 (600 hp) 4,925 Meritor FL941 (20,000 lb) Meritor RZ-166 (69,000 lb) Kenworth AG690 (69,000lb) Air <NA> 259'' <NA> 18 speed main & 4 speed au <NA> 7,750
## 5 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,367 km <NA> 12180 <NA> <NA> Cummins ISX15 (550hp) 38 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 3,300
## 6 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,421 km <NA> 12179 <NA> <NA> Cummins ISX15 (550hp) 46 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 3,300
## 7 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 2,157 km <NA> 12181 <NA> <NA> Cummins ISX15 (550hp) 64 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 8 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,444 km <NA> 12954 <NA> <NA> Cummins ISX15 (550hp) 45 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 9 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,427 km <NA> 12955 <NA> <NA> Cummins ISX15 (550hp) 43 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 10 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,982 km <NA> 12182 <NA> <NA> Cummins ISX15 (550hp) 78 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 11 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 23,293 km <NA> 12953 <NA> <NA> Cummins ISX15 (550hp) 394 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 12 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 27,215 km <NA> 12509 <NA> <NA> Cummins ISX15 (550hp) 458 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 6,600
## 13 Used Volvo VNL64T 780-730 72,000 km VNL64T780-730 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 5,000
## 14 New Peterbilt 367 T/A Wet Kit Tractor c/w 58'' Sleeper 60,657 km <NA> 10838 <NA> <NA> Cummins ISX15 (550hp) 1,822 Dana Spicer E14621 (14,600 lb Dana D46-170HP (46,000lb) tand Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 1,800
## 15 Used International ProStar +122 36,236 km <NA> 463555 Air White Cummins ISX <NA> Arvin Meritor 13200 lb Arvin Meritor 40000 lb Int'l IROS 11R22.5 228 in 450 Eaton Fuller D/O (18 spd) White 8,750
## 16 Used International ProStar +122 33,000 km <NA> 463543 Air White Cummins ISX <NA> Arvin Meritor 13200 lb Arvin Meritor 46000 lb Int'l IROS 11R/22.5 236 in 475 Eaton Fuller D/O (18 spd) White 5,900
现在,我意识到完全从 "ascending-order assumption"(我们称之为)转移到 1/13 假设可能更可取,我们可以通过更改 lris
计算。这将使我们无需根据从 unique()
调用收到的订单重新排序 cns
。
下面我将对此进行演示,恢复为无后缀的变量名,这将很有用,稍后将看到:
## change lris calculation to depend directly on 1/13 assumption; don't bother reordering
cns <- unique(names(testlist));
lcis <- match(names(testlist),cns);
lris <- c(1L,cumsum(lcis[-1]==1L)+1L);
df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsAsFactors=F)[0,];
df[max(lris),] <- NA;
for (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); };
df;
## Status Make Model Kilometres Stock.Number Engine Number.of.Hours Front.axle Rear.axle Suspension Wheelbase Transmission Price Style.Trim Brakes Mfg.Exterior.Colour Tires Engine..HP. Exterior.Colour
## 1 New Peterbilt 367 Tri-Drive c/w 58'' Sleeper 3,360 km 12949 Cummins ISX15 (550 hp) 44 Dana Spicer D2000 (20,000lb) Dana T69-170 (wide track) t Peterbilt Air-Trak (66,000lb) 267'' RTLO18918B Fuller (18 speed) 7,770 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 New Kenworth T800 T/A Tractor 82,230 km 10720 Cummins ISX15 (550hp) 2,712 Dana Spicer D2000 (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252 (52,000lb) Air 244'' Fuller 18 spd main AT1202 2 sp 9,500 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 New Kenworth T800 Tandem Tractor w/ 38'' Sleeper 98,521 km 10722 Cummins ISX15 (550hp) 2,790 Dana Spicer D2000 (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252 (52,000lb) Air 244'' Fuller 18 spd main AT1202 2 sp 9,500 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 Used Kenworth W900 Tri-Drive Sleeper Truck Tractor 170,422 km 13227 Cummins ISX15 (600 hp) 4,925 Meritor FL941 (20,000 lb) Meritor RZ-166 (69,000 lb) Kenworth AG690 (69,000lb) Air 259'' 18 speed main & 4 speed au 7,750 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,367 km 12180 Cummins ISX15 (550hp) 38 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 3,300 <NA> <NA> <NA> <NA> <NA> <NA>
## 6 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,421 km 12179 Cummins ISX15 (550hp) 46 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 3,300 <NA> <NA> <NA> <NA> <NA> <NA>
## 7 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 2,157 km 12181 Cummins ISX15 (550hp) 64 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 8 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,444 km 12954 Cummins ISX15 (550hp) 45 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 9 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,427 km 12955 Cummins ISX15 (550hp) 43 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 10 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,982 km 12182 Cummins ISX15 (550hp) 78 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 11 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 23,293 km 12953 Cummins ISX15 (550hp) 394 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 12 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 27,215 km 12509 Cummins ISX15 (550hp) 458 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 6,600 <NA> <NA> <NA> <NA> <NA> <NA>
## 13 Used Volvo VNL64T 780-730 72,000 km <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 5,000 VNL64T780-730 <NA> <NA> <NA> <NA> <NA>
## 14 New Peterbilt 367 T/A Wet Kit Tractor c/w 58'' Sleeper 60,657 km 10838 Cummins ISX15 (550hp) 1,822 Dana Spicer E14621 (14,600 lb Dana D46-170HP (46,000lb) tand Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 1,800 <NA> <NA> <NA> <NA> <NA> <NA>
## 15 Used International ProStar +122 36,236 km 463555 Cummins ISX <NA> Arvin Meritor 13200 lb Arvin Meritor 40000 lb Int'l IROS 228 in Eaton Fuller D/O (18 spd) 8,750 <NA> Air White 11R22.5 450 White
## 16 Used International ProStar +122 33,000 km 463543 Cummins ISX <NA> Arvin Meritor 13200 lb Arvin Meritor 46000 lb Int'l IROS 236 in Eaton Fuller D/O (18 spd) 5,900 <NA> Air White 11R/22.5 475 White
可以看到,df
的列顺序和df2
不一样,但是可以证明数据是一致的:
## prove df2 and df are identical, ignoring the column order difference
identical(df,df2[names(df)]);
## [1] TRUE
我能想到的最佳解决方案
library(data.table)
listnames <- names(testlist)
# "Color" "HP" "Type" "HP" "Type" "Color" "HP" "Type"
unames <- unique(listnames)
# "Color" "HP" "Type"
a <- setNames(1:length(unames), unames)
# Color HP Type
# 1 2 3
d <- unname(a[listnames])
# [1] 1 2 3 2 3 1 2 3
splitted_list <- split(testlist, cumsum(shift(d, fill=0)>d))
# results in testlist splitted by increasing sequences in d
# (1,2,3), (2,3), (1, 2, 3)
# You can impose a different splitting condition here, for instance,
# if each entry begins with 1, then cumsum(d==1) is adequate
# and the last step is pretty much self explanatory
rbindlist(lapply(splitted_list, data.frame), fill=TRUE)
# Color HP Type
# 1: Blue 405 Truck
# 2: NA 400 Car
# 3: White 500 Truck
希望它能解决您的问题。
当从 Dropbox 应用拆分条件 cumsum(d==1)
的测试数据时,结果是
structure(list(Status = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("New", "Used"
), class = "factor"), Make = structure(c(1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 4L, 4L), .Label = c("Peterbilt",
"Kenworth", "Volvo", "International"), class = "factor"), Model = structure(c(1L,
2L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 8L), .Label = c("367 Tri-Drive c/w 58'' Sleeper",
"T800 T/A Tractor", "T800 Tandem Tractor w/ 38'' Sleeper", "W900 Tri-Drive Sleeper Truck Tractor",
"367 T/A Wet-Kit Tractor c/w 58'' Sleeper", "VNL64T 780-730",
"367 T/A Wet Kit Tractor c/w 58'' Sleeper", "ProStar +122"
), class = "factor"), Kilometres = structure(1:16, .Label = c("3,360 km",
"82,230 km", "98,521 km", "170,422 km", "3,367 km", "3,421 km",
"2,157 km", "3,444 km", "3,427 km", "3,982 km", "23,293 km",
"27,215 km", "72,000 km", "60,657 km", "36,236 km", "33,000 km"
), class = "factor"), Stock.Number = structure(c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, NA, 13L, 14L, 15L), .Label = c("12949",
"10720", "10722", "13227", "12180", "12179", "12181", "12954",
"12955", "12182", "12953", "12509", "10838", "463555", "463543"
), class = "factor"), Engine = structure(c(1L, 2L, 2L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Cummins ISX15 (550 hp)",
"Cummins ISX15 (550hp)", "Cummins ISX15 (600 hp)", "Cummins ISX15 (550hp)",
"Cummins ISX"), class = "factor"), Number.of.Hours = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, NA, 13L, NA, NA
), .Label = c("44", "2,712", "2,790", "4,925", "38", "46", "64",
"45", "43", "78", "394", "458", "1,822"), class = "factor"),
Front.axle = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Dana Spicer D2000 (20,000lb)",
"Dana Spicer D2000 (20,000 lb)", "Meritor FL941 (20,000 lb)",
"Dana Spicer E14621 (14,600 lb", "Arvin Meritor 13200 lb"
), class = "factor"), Rear.axle = structure(c(1L, 2L, 2L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, 5L, 6L, 7L), .Label = c("Dana T69-170 (wide track) t",
"Dana D46-170HPW (46,000 lb) ta", "Meritor RZ-166 (69,000 lb)",
"Dana D46-170 (46,000lb) ta", "Dana D46-170HP (46,000lb) tand",
"Arvin Meritor 40000 lb", "Arvin Meritor 46000 lb"), class = "factor"),
Suspension = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Peterbilt Air-Trak (66,000lb)",
"Neway ADZ252 (52,000lb) Air", "Kenworth AG690 (69,000lb) Air",
"Peterbilt Air-Trak (46,000lb)", "Int'l IROS"), class = "factor"),
Wheelbase = structure(c(1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, NA, 2L, 4L, 5L), .Label = c("267''", "244''",
"259''", "228 in", "236 in"), class = "factor"), Transmission = structure(c(1L,
2L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 4L, 4L
), .Label = c("RTLO18918B Fuller (18 speed)", "Fuller 18 spd main AT1202 2 sp",
"18 speed main & 4 speed au", "Eaton Fuller D/O (18 spd)"
), class = "factor"), Price = structure(c(1L, 2L, 2L, 3L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("7,770",
"9,500", "7,750", "3,300", "9,880", "6,600",
"5,000", "1,800", "8,750", "5,900"), class = "factor"),
Style.Trim = structure(c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1L, NA, NA, NA), .Label = "VNL64T780-730", class = "factor"),
Brakes = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1L, 1L), .Label = "Air", class = "factor"),
Mfg.Exterior.Colour = structure(c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L), .Label = "White", class = "factor"),
Tires = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1L, 2L), .Label = c("11R22.5", "11R/22.5"
), class = "factor"), Engine..HP. = structure(c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 2L), .Label = c("450",
"475"), class = "factor"), Exterior.Colour = structure(c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L
), .Label = "White", class = "factor")), .Names = c("Status",
"Make", "Model", "Kilometres", "Stock.Number", "Engine", "Number.of.Hours",
"Front.axle", "Rear.axle", "Suspension", "Wheelbase", "Transmission",
"Price", "Style.Trim", "Brakes", "Mfg.Exterior.Colour", "Tires",
"Engine..HP.", "Exterior.Colour"), row.names = c(NA, -16L), class = "data.frame")
与
例如:列表
testlist <- list("Blue", "405", "Truck", "400", "Car", "White", "500", "Truck")
testnames <- c("Color", "HP", "Type", "HP", "Type", "Color", "HP", "Type")
names(testlist) <- testnames
$Color
[1] "Blue"
$HP
[1] "405"
$Type
[1] "Truck"
$HP
[1] "400"
$Type
[1] "Car"
$Color
[1] "White"
$HP
[1] "500"
$Type
[1] "Truck"
可以更改为数据 table 使用:
dcast(setDT(melt(testlist))[, N:=1:.N, L1], N~L1, value.var='value')
但输出是这样的:
N Color HP Type
1 1 Blue 405 Truck
2 2 White 400 Car
3 3 <NA> 500 Truck
当我想要的时候:
N Color HP Type
1 1 Blue 405 Truck
2 2 <NA> 400 Car
3 3 White 500 Truck
有人对如何解决这个问题有建议吗?感谢您的帮助。
可能不是最佳解决方案,因为它使用了 while 循环。但是,使用 tidyr
或您最喜欢的其他整形包。
testlist <- c("Blue", "405", "Truck", "400", "Car", "White", "500", "Truck")
testnames <- c("Color", "HP", "Type", "HP", "Type", "Color", "HP", "Type")
df <- data.frame(names = testnames, attributes = testlist, stringsAsFactors = FALSE)
# need to count number of vehicles inside data frame
# initialise while loop counters
df_index = 1
vehicle_index = vector(mode = "integer", length = nrow(df))
vehicle_count = 1
# now loop through the data frame to find attributes
# which belong to vehicle 1, 2, 3, etc...
while(df_index <= nrow(df)){
if (sum(c("Color", "HP", "Type") == df$names[df_index:(df_index+2)]) == 3) {
vehicle_index[df_index:(df_index+2)] <- vehicle_count
df_index = df_index + 3
vehicle_count = vehicle_count + 1
} else if (sum(c("Color", "HP", "Type") %in% df$names[df_index:(df_index+1)]) == 2) {
vehicle_index[df_index:(df_index+1)] <- vehicle_count
df_index = df_index + 2
vehicle_count = vehicle_count + 1
} else {
vehicle_index[df_index:(df_index)] <- vehicle_count
df_index = df_index + 1
vehicle_count = vehicle_count + 1
}
}
# finally, label the vehicle attributes with the vehicle number,
# and spread the data.
df_final <- data.frame(df, vehicle_index = vehicle_index)
tidyr::spread(df_final, key = "names", value = "attributes")
一种方法是用正确的行数和正确的列数、名称和类型预分配 table,然后通过索引分配原始列表覆盖的单元格来填充它.
cns <- c('Color','HP','Type');
lcis <- match(names(testlist),cns);
lris <- c(1L,cumsum(diff(lcis)<=0L)+1L);
df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsAsFactors=F)[0,];
df[max(lris),] <- NA;
df;
## Color HP Type
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
for (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); };
df;
## Color HP Type
## 1 Blue 405 Truck
## 2 <NA> 400 Car
## 3 White 500 Truck
在我的解决方案中,我小心翼翼地分别处理每一列,如果输出中的不同列 table(对应于输入列表中不同的组件子集)具有不同的数据类型,这提供了潜在的好处,那么这些数据类型将被保留在最后的 table 中。这就是我为索引分配选择 for
循环的原因。对于只有字符类型的精确输入列表,这当然不是必需的,但无论如何我认为这是一个值得的目标。
中间变量的解释
cns
输出中的列名 table.lcis
每个输入列表组件将在输出中具有的列索引 table。这是通过简单地将输入列表组件的名称与cns
. 进行匹配来计算的
lris
每个输入列表组件将在输出中具有的行索引 table。这个变量的计算有点有趣并且是解决方案的核心。由于输入列表中的列表示不完整(IOW 在输入列表中可以有 "missing columns"),但您认为输入列表组件是根据它们在输出中的按行出现来排序的 [=96= 】,我们不能使用常规索引(比如将每三个组件作为一行),我们也不能使用任何单个列名作为每一行的标记,因为任何列都可以在任何行中丢失。根据我的想法,唯一正确的方法是确定输入列表中的低索引(或实际上是等索引)列何时紧接在高索引(或等索引)列之后出现,并将其作为换行符.因此,我们可以取diff(lcis)<=0L
得到一个表示换行的逻辑向量,取cumsum()
加 1 得到行索引,我们还必须手动添加 1 来完成向量。ci
输出中的列索引 table。在for
循环期间用于迭代每个输出列。m
为for
循环中的每个ci
计算。一个逻辑向量,表示哪些输入列表组件属于当前列ci
。用于索引lris
(提取行索引进行分配)和输入列表本身(提取实际值进行分配)。
实际数据
我从 dropbox 中抓取了你的真实数据并将其存储为 testlist
。以下是我的调查结果。
首先,我按照出现的顺序检查了唯一的组件名称,将它们设为 cns
:
## first reasonable assumption about cns
cns <- unique(names(testlist));
cns;
## [1] "Status" "Make" "Model"
## [4] "Kilometres" "Stock Number" "Engine"
## [7] "Number of Hours" "Front axle" "Rear axle"
## [10] "Suspension" "Wheelbase" "Transmission"
## [13] "Price" "Style/Trim" "Brakes"
## [16] "Mfg Exterior Colour" "Tires" "Engine (HP)"
## [19] "Exterior Colour"
从中我们可以计算出一个新的暂定 lcis
:
## examine lcis for ordering
lcis <- match(names(testlist),cns);
lcis;
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12
## [26] 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11
## [51] 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10
## [76] 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9
## [101] 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8
## [126] 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7
## [151] 8 9 10 11 12 13 1 2 3 4 14 13 1 2 3 4 5 6 7 8 9 10 11 12 13
## [176] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13 1 2 3 4 5 15 16 6
## [201] 8 9 10 17 11 18 12 19 13
仔细观察上面的向量,我们可以看到它以1:13
的许多规则重复开始。事实上,只有在向量的末尾,它才变得不规则,我们看到 14 后跟 13,16 后跟 6,10-11-12 与 17-18-19 交错,等等
但我们在这里可以做的一个重要观察是,向量似乎由 1 和 13 划定的组组成。换句话说,对于似乎具有某种规律性的所有范围(即使也存在一些不规则性) ,它们似乎以 1 开头,以 13 结尾。这一观察结果与您关于车辆数据中间无序的评论一致。让我们称之为 1/13 假设。
我们可以通过在这个 1/13 边界上拆分来更清楚地了解组:
## recognizing 1/13 consistency, split on it to see how each (possible) row looks under this assumption
split(lcis,cumsum(lcis==1L));
## $`1`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`2`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`3`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`4`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`5`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`6`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`7`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`8`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`9`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`10`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`11`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`12`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`13`
## [1] 1 2 3 4 14 13
##
## $`14`
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13
##
## $`15`
## [1] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13
##
## $`16`
## [1] 1 2 3 4 5 15 16 6 8 9 10 17 11 18 12 19 13
现在,如果您 非常 仔细查看以上组,您会发现可以重新排序 cns
,这样所有组都会升序排列。它们不会是连续的,但我为原始问题设计的解决方案不需要连续;所有必要的是升序。
例如,我们需要将第 14 列排在第 13 列之前,我们需要将第 15 和 16 列排在第 6、8、9 等列之前:
## recognizing the possibility of reordering to achieve perfect within-row ascending order, reorder cns to cns2
cns2 <- cns[c(1,2,3,4,14,5,15,16,6,7,8,9,10,17,11,18,12,19,13)];
cns2;
## [1] "Status" "Make" "Model"
## [4] "Kilometres" "Style/Trim" "Stock Number"
## [7] "Brakes" "Mfg Exterior Colour" "Engine"
## [10] "Number of Hours" "Front axle" "Rear axle"
## [13] "Suspension" "Tires" "Wheelbase"
## [16] "Engine (HP)" "Transmission" "Exterior Colour"
## [19] "Price"
现在我们可以重新计算 lcis
,我现在将其称为 lcis2
,并演示新的组订单:
## calculate lcis2 from cns2, and prove that we've successfully ordered each individual row under the 1/13 (now 1/19) break assumption
lcis2 <- match(names(testlist),cns2);
split(lcis2,cumsum(lcis2==1L));
## $`1`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`2`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`3`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`4`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`5`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`6`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`7`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`8`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`9`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`10`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`11`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`12`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`13`
## [1] 1 2 3 4 5 19
##
## $`14`
## [1] 1 2 3 4 6 9 10 11 12 13 15 17 19
##
## $`15`
## [1] 1 2 3 4 6 7 8 9 11 12 13 14 15 16 17 18 19
##
## $`16`
## [1] 1 2 3 4 6 7 8 9 11 12 13 14 15 16 17 18 19
最后,我们可以 运行 整个解决方案,现在要小心使用 2 后缀的变量名:
## now we can apply the preallocate/fill-in solution using cns2 and lcis2
## will use lris2 and df2 just to be consistent
lris2 <- c(1L,cumsum(diff(lcis2)<=0L)+1L);
df2 <- as.data.frame(testlist[match(1:length(cns2),lcis2)],stringsAsFactors=F)[0,];
df2[max(lris2),] <- NA;
df2;
## Status Make Model Kilometres Style.Trim Stock.Number Brakes Mfg.Exterior.Colour Engine Number.of.Hours Front.axle Rear.axle Suspension Tires Wheelbase Engine..HP. Transmission Exterior.Colour Price
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 7 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 11 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 12 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 13 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 14 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 15 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
for (ci in 1:length(cns2)) { m <- lcis2==ci; df2[lris2[m],ci] <- do.call(c,testlist[m]); };
df2;
## Status Make Model Kilometres Style.Trim Stock.Number Brakes Mfg.Exterior.Colour Engine Number.of.Hours Front.axle Rear.axle Suspension Tires Wheelbase Engine..HP. Transmission Exterior.Colour Price
## 1 New Peterbilt 367 Tri-Drive c/w 58'' Sleeper 3,360 km <NA> 12949 <NA> <NA> Cummins ISX15 (550 hp) 44 Dana Spicer D2000 (20,000lb) Dana T69-170 (wide track) t Peterbilt Air-Trak (66,000lb) <NA> 267'' <NA> RTLO18918B Fuller (18 speed) <NA> 7,770
## 2 New Kenworth T800 T/A Tractor 82,230 km <NA> 10720 <NA> <NA> Cummins ISX15 (550hp) 2,712 Dana Spicer D2000 (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252 (52,000lb) Air <NA> 244'' <NA> Fuller 18 spd main AT1202 2 sp <NA> 9,500
## 3 New Kenworth T800 Tandem Tractor w/ 38'' Sleeper 98,521 km <NA> 10722 <NA> <NA> Cummins ISX15 (550hp) 2,790 Dana Spicer D2000 (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252 (52,000lb) Air <NA> 244'' <NA> Fuller 18 spd main AT1202 2 sp <NA> 9,500
## 4 Used Kenworth W900 Tri-Drive Sleeper Truck Tractor 170,422 km <NA> 13227 <NA> <NA> Cummins ISX15 (600 hp) 4,925 Meritor FL941 (20,000 lb) Meritor RZ-166 (69,000 lb) Kenworth AG690 (69,000lb) Air <NA> 259'' <NA> 18 speed main & 4 speed au <NA> 7,750
## 5 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,367 km <NA> 12180 <NA> <NA> Cummins ISX15 (550hp) 38 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 3,300
## 6 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,421 km <NA> 12179 <NA> <NA> Cummins ISX15 (550hp) 46 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 3,300
## 7 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 2,157 km <NA> 12181 <NA> <NA> Cummins ISX15 (550hp) 64 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 8 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,444 km <NA> 12954 <NA> <NA> Cummins ISX15 (550hp) 45 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 9 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,427 km <NA> 12955 <NA> <NA> Cummins ISX15 (550hp) 43 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 10 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,982 km <NA> 12182 <NA> <NA> Cummins ISX15 (550hp) 78 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 11 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 23,293 km <NA> 12953 <NA> <NA> Cummins ISX15 (550hp) 394 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 9,880
## 12 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 27,215 km <NA> 12509 <NA> <NA> Cummins ISX15 (550hp) 458 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 6,600
## 13 Used Volvo VNL64T 780-730 72,000 km VNL64T780-730 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 5,000
## 14 New Peterbilt 367 T/A Wet Kit Tractor c/w 58'' Sleeper 60,657 km <NA> 10838 <NA> <NA> Cummins ISX15 (550hp) 1,822 Dana Spicer E14621 (14,600 lb Dana D46-170HP (46,000lb) tand Peterbilt Air-Trak (46,000lb) <NA> 244'' <NA> RTLO18918B Fuller (18 speed) <NA> 1,800
## 15 Used International ProStar +122 36,236 km <NA> 463555 Air White Cummins ISX <NA> Arvin Meritor 13200 lb Arvin Meritor 40000 lb Int'l IROS 11R22.5 228 in 450 Eaton Fuller D/O (18 spd) White 8,750
## 16 Used International ProStar +122 33,000 km <NA> 463543 Air White Cummins ISX <NA> Arvin Meritor 13200 lb Arvin Meritor 46000 lb Int'l IROS 11R/22.5 236 in 475 Eaton Fuller D/O (18 spd) White 5,900
现在,我意识到完全从 "ascending-order assumption"(我们称之为)转移到 1/13 假设可能更可取,我们可以通过更改 lris
计算。这将使我们无需根据从 unique()
调用收到的订单重新排序 cns
。
下面我将对此进行演示,恢复为无后缀的变量名,这将很有用,稍后将看到:
## change lris calculation to depend directly on 1/13 assumption; don't bother reordering
cns <- unique(names(testlist));
lcis <- match(names(testlist),cns);
lris <- c(1L,cumsum(lcis[-1]==1L)+1L);
df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsAsFactors=F)[0,];
df[max(lris),] <- NA;
for (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); };
df;
## Status Make Model Kilometres Stock.Number Engine Number.of.Hours Front.axle Rear.axle Suspension Wheelbase Transmission Price Style.Trim Brakes Mfg.Exterior.Colour Tires Engine..HP. Exterior.Colour
## 1 New Peterbilt 367 Tri-Drive c/w 58'' Sleeper 3,360 km 12949 Cummins ISX15 (550 hp) 44 Dana Spicer D2000 (20,000lb) Dana T69-170 (wide track) t Peterbilt Air-Trak (66,000lb) 267'' RTLO18918B Fuller (18 speed) 7,770 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 New Kenworth T800 T/A Tractor 82,230 km 10720 Cummins ISX15 (550hp) 2,712 Dana Spicer D2000 (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252 (52,000lb) Air 244'' Fuller 18 spd main AT1202 2 sp 9,500 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 New Kenworth T800 Tandem Tractor w/ 38'' Sleeper 98,521 km 10722 Cummins ISX15 (550hp) 2,790 Dana Spicer D2000 (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252 (52,000lb) Air 244'' Fuller 18 spd main AT1202 2 sp 9,500 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 Used Kenworth W900 Tri-Drive Sleeper Truck Tractor 170,422 km 13227 Cummins ISX15 (600 hp) 4,925 Meritor FL941 (20,000 lb) Meritor RZ-166 (69,000 lb) Kenworth AG690 (69,000lb) Air 259'' 18 speed main & 4 speed au 7,750 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,367 km 12180 Cummins ISX15 (550hp) 38 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 3,300 <NA> <NA> <NA> <NA> <NA> <NA>
## 6 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,421 km 12179 Cummins ISX15 (550hp) 46 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 3,300 <NA> <NA> <NA> <NA> <NA> <NA>
## 7 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 2,157 km 12181 Cummins ISX15 (550hp) 64 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 8 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,444 km 12954 Cummins ISX15 (550hp) 45 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 9 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,427 km 12955 Cummins ISX15 (550hp) 43 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 10 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 3,982 km 12182 Cummins ISX15 (550hp) 78 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 11 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 23,293 km 12953 Cummins ISX15 (550hp) 394 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 9,880 <NA> <NA> <NA> <NA> <NA> <NA>
## 12 New Peterbilt 367 T/A Wet-Kit Tractor c/w 58'' Sleeper 27,215 km 12509 Cummins ISX15 (550hp) 458 Dana Spicer E14621 (14,600 lb Dana D46-170 (46,000lb) ta Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 6,600 <NA> <NA> <NA> <NA> <NA> <NA>
## 13 Used Volvo VNL64T 780-730 72,000 km <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 5,000 VNL64T780-730 <NA> <NA> <NA> <NA> <NA>
## 14 New Peterbilt 367 T/A Wet Kit Tractor c/w 58'' Sleeper 60,657 km 10838 Cummins ISX15 (550hp) 1,822 Dana Spicer E14621 (14,600 lb Dana D46-170HP (46,000lb) tand Peterbilt Air-Trak (46,000lb) 244'' RTLO18918B Fuller (18 speed) 1,800 <NA> <NA> <NA> <NA> <NA> <NA>
## 15 Used International ProStar +122 36,236 km 463555 Cummins ISX <NA> Arvin Meritor 13200 lb Arvin Meritor 40000 lb Int'l IROS 228 in Eaton Fuller D/O (18 spd) 8,750 <NA> Air White 11R22.5 450 White
## 16 Used International ProStar +122 33,000 km 463543 Cummins ISX <NA> Arvin Meritor 13200 lb Arvin Meritor 46000 lb Int'l IROS 236 in Eaton Fuller D/O (18 spd) 5,900 <NA> Air White 11R/22.5 475 White
可以看到,df
的列顺序和df2
不一样,但是可以证明数据是一致的:
## prove df2 and df are identical, ignoring the column order difference
identical(df,df2[names(df)]);
## [1] TRUE
我能想到的最佳解决方案
library(data.table)
listnames <- names(testlist)
# "Color" "HP" "Type" "HP" "Type" "Color" "HP" "Type"
unames <- unique(listnames)
# "Color" "HP" "Type"
a <- setNames(1:length(unames), unames)
# Color HP Type
# 1 2 3
d <- unname(a[listnames])
# [1] 1 2 3 2 3 1 2 3
splitted_list <- split(testlist, cumsum(shift(d, fill=0)>d))
# results in testlist splitted by increasing sequences in d
# (1,2,3), (2,3), (1, 2, 3)
# You can impose a different splitting condition here, for instance,
# if each entry begins with 1, then cumsum(d==1) is adequate
# and the last step is pretty much self explanatory
rbindlist(lapply(splitted_list, data.frame), fill=TRUE)
# Color HP Type
# 1: Blue 405 Truck
# 2: NA 400 Car
# 3: White 500 Truck
希望它能解决您的问题。
当从 Dropbox 应用拆分条件 cumsum(d==1)
的测试数据时,结果是
structure(list(Status = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("New", "Used"
), class = "factor"), Make = structure(c(1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 4L, 4L), .Label = c("Peterbilt",
"Kenworth", "Volvo", "International"), class = "factor"), Model = structure(c(1L,
2L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 8L), .Label = c("367 Tri-Drive c/w 58'' Sleeper",
"T800 T/A Tractor", "T800 Tandem Tractor w/ 38'' Sleeper", "W900 Tri-Drive Sleeper Truck Tractor",
"367 T/A Wet-Kit Tractor c/w 58'' Sleeper", "VNL64T 780-730",
"367 T/A Wet Kit Tractor c/w 58'' Sleeper", "ProStar +122"
), class = "factor"), Kilometres = structure(1:16, .Label = c("3,360 km",
"82,230 km", "98,521 km", "170,422 km", "3,367 km", "3,421 km",
"2,157 km", "3,444 km", "3,427 km", "3,982 km", "23,293 km",
"27,215 km", "72,000 km", "60,657 km", "36,236 km", "33,000 km"
), class = "factor"), Stock.Number = structure(c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, NA, 13L, 14L, 15L), .Label = c("12949",
"10720", "10722", "13227", "12180", "12179", "12181", "12954",
"12955", "12182", "12953", "12509", "10838", "463555", "463543"
), class = "factor"), Engine = structure(c(1L, 2L, 2L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Cummins ISX15 (550 hp)",
"Cummins ISX15 (550hp)", "Cummins ISX15 (600 hp)", "Cummins ISX15 (550hp)",
"Cummins ISX"), class = "factor"), Number.of.Hours = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, NA, 13L, NA, NA
), .Label = c("44", "2,712", "2,790", "4,925", "38", "46", "64",
"45", "43", "78", "394", "458", "1,822"), class = "factor"),
Front.axle = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Dana Spicer D2000 (20,000lb)",
"Dana Spicer D2000 (20,000 lb)", "Meritor FL941 (20,000 lb)",
"Dana Spicer E14621 (14,600 lb", "Arvin Meritor 13200 lb"
), class = "factor"), Rear.axle = structure(c(1L, 2L, 2L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, 5L, 6L, 7L), .Label = c("Dana T69-170 (wide track) t",
"Dana D46-170HPW (46,000 lb) ta", "Meritor RZ-166 (69,000 lb)",
"Dana D46-170 (46,000lb) ta", "Dana D46-170HP (46,000lb) tand",
"Arvin Meritor 40000 lb", "Arvin Meritor 46000 lb"), class = "factor"),
Suspension = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Peterbilt Air-Trak (66,000lb)",
"Neway ADZ252 (52,000lb) Air", "Kenworth AG690 (69,000lb) Air",
"Peterbilt Air-Trak (46,000lb)", "Int'l IROS"), class = "factor"),
Wheelbase = structure(c(1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, NA, 2L, 4L, 5L), .Label = c("267''", "244''",
"259''", "228 in", "236 in"), class = "factor"), Transmission = structure(c(1L,
2L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 4L, 4L
), .Label = c("RTLO18918B Fuller (18 speed)", "Fuller 18 spd main AT1202 2 sp",
"18 speed main & 4 speed au", "Eaton Fuller D/O (18 spd)"
), class = "factor"), Price = structure(c(1L, 2L, 2L, 3L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("7,770",
"9,500", "7,750", "3,300", "9,880", "6,600",
"5,000", "1,800", "8,750", "5,900"), class = "factor"),
Style.Trim = structure(c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1L, NA, NA, NA), .Label = "VNL64T780-730", class = "factor"),
Brakes = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1L, 1L), .Label = "Air", class = "factor"),
Mfg.Exterior.Colour = structure(c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L), .Label = "White", class = "factor"),
Tires = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1L, 2L), .Label = c("11R22.5", "11R/22.5"
), class = "factor"), Engine..HP. = structure(c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 2L), .Label = c("450",
"475"), class = "factor"), Exterior.Colour = structure(c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L
), .Label = "White", class = "factor")), .Names = c("Status",
"Make", "Model", "Kilometres", "Stock.Number", "Engine", "Number.of.Hours",
"Front.axle", "Rear.axle", "Suspension", "Wheelbase", "Transmission",
"Price", "Style.Trim", "Brakes", "Mfg.Exterior.Colour", "Tires",
"Engine..HP.", "Exterior.Colour"), row.names = c(NA, -16L), class = "data.frame")