运行 并行加载到 Clickhouse Docker powershell 中的容器
Running a parallel load into Clickhouse Docker Container in powershell
我正在尝试通过 powershell 加快将我的数据加载到 docker 托管在 Windows 10 的 clickhouse,我想知道我是否可以利用并行进程来拥有 4 个文件同时加载。我有兴趣获得一些帮助,以了解这是否可能,或者获得一些关于如何处理的指示。下面是我用来加载数据的当前脚本:
$files = Get-ChildItem "my_directory" | Sort-Object
foreach ($f in $files){
$outfile = $f.FullName | Write-Host
Get-Date | Write-Host
"Start loading" + $f.FullName | Write-Host
`cat $f.FullName | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO my_table FORMAT CSV"`
Get-Date | Write-Host
"End loading" + $f.FullName | Write-Host
[GC]::Collect()
}
我正在一个一个地加载文件,我想一次加载 4 个。基于此 link:
我已尝试将代码放在一起,但可以使用一些帮助来查看我是否在正确的轨道上:
#I am assuming this is the code block of what to do
$block = {
$outfile = $f.FullName | Write-Host
Get-Date | Write-Host
"Start loading" + $f.FullName | Write-Host
`cat $f.FullName | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO my_table FORMAT CSV"`
Get-Date | Write-Host
"End loading" + $f.FullName | Write-Host
[GC]::Collect())
}
#my directory of files
$files = Get-ChildItem "my_directory" | Sort-Object| Sort-Object
#Remove all jobs
Get-Job | Remove-Job
$MaxThreads = 4
#Start the jobs. Max 4 jobs running simultaneously.
foreach($f in $files){
While ($(Get-Job -state running).count -ge $MaxThreads){
Start-Sleep -Milliseconds 3
}
Start-Job -Scriptblock $Block -ArgumentList $f
}
#Wait for all jobs to finish.
While ($(Get-Job -State Running).count -gt 0){
start-sleep 1
}
#Get information from each job.
foreach($job in Get-Job){
$info= Receive-Job -Id ($job.Id)
}
#Remove all jobs created.
Get-Job | Remove-Job
powershell 新手,感谢帮助。
我相信我已经解决了这个问题:
#create my block
$direc = "my_direc"
$block = {
param([string]$file)
`cat $direc/$file | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO test FORMAT CSV"`
[GC]::Collect()
}
#Remove all jobs
Get-Job | Remove-Job
$MaxThreads = 4
#Start the jobs. Max 6 jobs running simultaneously.
foreach($file in $files){
While ($(Get-Job -state running).count -ge $MaxThreads){
Start-Sleep -Milliseconds 3
}
Start-Job -Scriptblock $Block -ArgumentList $file
}
#Wait for all jobs to finish.
While ($(Get-Job -State Running).count -gt 0){
start-sleep 1
}
#Get information from each job.
foreach($job in Get-Job){
$info= Receive-Job -Id ($job.Id)
}
#Remove all jobs created.
Get-Job | Remove-Job
我正在尝试通过 powershell 加快将我的数据加载到 docker 托管在 Windows 10 的 clickhouse,我想知道我是否可以利用并行进程来拥有 4 个文件同时加载。我有兴趣获得一些帮助,以了解这是否可能,或者获得一些关于如何处理的指示。下面是我用来加载数据的当前脚本:
$files = Get-ChildItem "my_directory" | Sort-Object
foreach ($f in $files){
$outfile = $f.FullName | Write-Host
Get-Date | Write-Host
"Start loading" + $f.FullName | Write-Host
`cat $f.FullName | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO my_table FORMAT CSV"`
Get-Date | Write-Host
"End loading" + $f.FullName | Write-Host
[GC]::Collect()
}
我正在一个一个地加载文件,我想一次加载 4 个。基于此 link:
我已尝试将代码放在一起,但可以使用一些帮助来查看我是否在正确的轨道上:
#I am assuming this is the code block of what to do
$block = {
$outfile = $f.FullName | Write-Host
Get-Date | Write-Host
"Start loading" + $f.FullName | Write-Host
`cat $f.FullName | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO my_table FORMAT CSV"`
Get-Date | Write-Host
"End loading" + $f.FullName | Write-Host
[GC]::Collect())
}
#my directory of files
$files = Get-ChildItem "my_directory" | Sort-Object| Sort-Object
#Remove all jobs
Get-Job | Remove-Job
$MaxThreads = 4
#Start the jobs. Max 4 jobs running simultaneously.
foreach($f in $files){
While ($(Get-Job -state running).count -ge $MaxThreads){
Start-Sleep -Milliseconds 3
}
Start-Job -Scriptblock $Block -ArgumentList $f
}
#Wait for all jobs to finish.
While ($(Get-Job -State Running).count -gt 0){
start-sleep 1
}
#Get information from each job.
foreach($job in Get-Job){
$info= Receive-Job -Id ($job.Id)
}
#Remove all jobs created.
Get-Job | Remove-Job
powershell 新手,感谢帮助。
我相信我已经解决了这个问题:
#create my block
$direc = "my_direc"
$block = {
param([string]$file)
`cat $direc/$file | docker run -i --rm --link ch:clickhouse-client yandex/clickhouse-client -m --host ch --query="INSERT INTO test FORMAT CSV"`
[GC]::Collect()
}
#Remove all jobs
Get-Job | Remove-Job
$MaxThreads = 4
#Start the jobs. Max 6 jobs running simultaneously.
foreach($file in $files){
While ($(Get-Job -state running).count -ge $MaxThreads){
Start-Sleep -Milliseconds 3
}
Start-Job -Scriptblock $Block -ArgumentList $file
}
#Wait for all jobs to finish.
While ($(Get-Job -State Running).count -gt 0){
start-sleep 1
}
#Get information from each job.
foreach($job in Get-Job){
$info= Receive-Job -Id ($job.Id)
}
#Remove all jobs created.
Get-Job | Remove-Job