使用 dplyr copy_to() 时如何指定主键?
How does one specify a primary key when using dplyr copy_to()?
我想使用 dplyr 中的 copy_to() 创建一个新的 table。可以轻松指定索引,但我没有看到指定主键的语法。推荐的方法是先简单地创建 table 然后使用 copy_to() 作为第二步将数据复制到其中,还是有单步解决方案来指定 [=13 中的主键=]()?
正如 John 在评论中提到的,Hadley 在旧的演示文稿中给出了一个设置索引的示例,archived here。他的例子:
hflights_db <- src_sqlite("hflights.sqlite3",
create = TRUE)
copy_to(
dest = hflights_db,
df = as.data.frame(flights),
name = "flights",
indexes = list(
c("date", "hour"),
"plane",
"dest",
"arr"
), temporary = FALSE
)
但是,索引不是主键。我仔细研究了 dplyr 和 dbplyr 源代码,看看是否可以指定主键。好像没有。我找到了 half-dplyr 解决方案。这是一个工作示例。
#packages
library(pacman)
p_load(nycflights13, dplyr)
#connect to a standard mysql database
#one cannot use sqlite because it doesn't allow one to modify primary keys after creation of a table, but one could create the key at creation
#but that would mean one cannot use copy_to
hflights_db = src_mysql("test", host = "127.0.0.1", username = "root", password = "root")
#does work?
hflights_db
#add a unique value to use as primary key
flights$key = 1:nrow(flights)
#copy flights data
copy_to(
dest = hflights_db,
df = as.data.frame(flights),
name = "flights",
indexes = list(
"key",
c("year", "month", "day", "hour", "minute"),
"tailnum",
"dest",
"origin",
"carrier"
),
temporary = FALSE,
)
#set primary key
dbExecute(hflights_db$con,
"ALTER TABLE flights
ADD PRIMARY KEY (`key`);"
)
#check status
tbl(hflights_db, "flights")
输出:
# Source: table<flights> [?? x 20]
# Database: mysql 5.7.19-0ubuntu0.16.04.1 [root@127.0.0.1:/test]
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest air_time distance hour
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR IAH 227 1400 5
2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA IAH 227 1416 5
3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK MIA 160 1089 5
4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK BQN 183 1576 5
5 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA ATL 116 762 6
6 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR ORD 150 719 5
7 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR FLL 158 1065 6
8 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA IAD 53 229 6
9 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK MCO 140 944 6
10 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA ORD 138 733 6
# ... with more rows, and 3 more variables: minute <dbl>, time_hour <chr>, key <int>
据我所知,dplyr 不显示索引或键。所以我用DBeaver来确认:
我想使用 dplyr 中的 copy_to() 创建一个新的 table。可以轻松指定索引,但我没有看到指定主键的语法。推荐的方法是先简单地创建 table 然后使用 copy_to() 作为第二步将数据复制到其中,还是有单步解决方案来指定 [=13 中的主键=]()?
正如 John 在评论中提到的,Hadley 在旧的演示文稿中给出了一个设置索引的示例,archived here。他的例子:
hflights_db <- src_sqlite("hflights.sqlite3",
create = TRUE)
copy_to(
dest = hflights_db,
df = as.data.frame(flights),
name = "flights",
indexes = list(
c("date", "hour"),
"plane",
"dest",
"arr"
), temporary = FALSE
)
但是,索引不是主键。我仔细研究了 dplyr 和 dbplyr 源代码,看看是否可以指定主键。好像没有。我找到了 half-dplyr 解决方案。这是一个工作示例。
#packages
library(pacman)
p_load(nycflights13, dplyr)
#connect to a standard mysql database
#one cannot use sqlite because it doesn't allow one to modify primary keys after creation of a table, but one could create the key at creation
#but that would mean one cannot use copy_to
hflights_db = src_mysql("test", host = "127.0.0.1", username = "root", password = "root")
#does work?
hflights_db
#add a unique value to use as primary key
flights$key = 1:nrow(flights)
#copy flights data
copy_to(
dest = hflights_db,
df = as.data.frame(flights),
name = "flights",
indexes = list(
"key",
c("year", "month", "day", "hour", "minute"),
"tailnum",
"dest",
"origin",
"carrier"
),
temporary = FALSE,
)
#set primary key
dbExecute(hflights_db$con,
"ALTER TABLE flights
ADD PRIMARY KEY (`key`);"
)
#check status
tbl(hflights_db, "flights")
输出:
# Source: table<flights> [?? x 20]
# Database: mysql 5.7.19-0ubuntu0.16.04.1 [root@127.0.0.1:/test]
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest air_time distance hour
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR IAH 227 1400 5
2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA IAH 227 1416 5
3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK MIA 160 1089 5
4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK BQN 183 1576 5
5 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA ATL 116 762 6
6 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR ORD 150 719 5
7 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR FLL 158 1065 6
8 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA IAD 53 229 6
9 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK MCO 140 944 6
10 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA ORD 138 733 6
# ... with more rows, and 3 more variables: minute <dbl>, time_hour <chr>, key <int>
据我所知,dplyr 不显示索引或键。所以我用DBeaver来确认: