使用 Powershell 将大输出从 Oracle 导出到 CSV
Using Powershell to export big output from Oracle to CSV
我需要每周从 Oracle 导出一个相当大的 CSV 文件。
我尝试了两种方法。
- Adapter.fill(数据集)
- 遍历列和行以一次一行保存到 CSV 文件中。
第一个 运行 当 运行 内存不足时(服务器机器只有 4 GB RAM),第二个需要大约一个小时,因为有超过 400 万行要处理导出。
这是代码 #1:
#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT manycolumns FROM somequery"
#Oracle login credentials and other variables
$username = "username"
$password = "password"
$datasource = "database address"
$output = "\NetworkLocation\Sales.csv"
#creates a blank CSV file and make sure it's in ASCI
Out-File $output -Force ascii
#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. We usually have two versions of Oracle installed so the adaptor can be in different locations. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force
#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query
#Creates a table in memory and fills it with results from the query. Then, export the virtual table into CSV.
$DataSet = New-Object System.Data.DataSet
$Adapter = New-Object Oracle.ManagedDataAccess.Client.OracleDataAdapter($command)
$Adapter.Fill($DataSet)
$DataSet.Tables[0] | Export-Csv $output -NoTypeInformation
$connection.Close()
这是#2
#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT manycolumns FROM somequery"
#Oracle login credentials and other variables
$username = "username"
$password = "password"
$datasource = "database address"
$output = "\NetworkLocation\Sales.csv"
$tempfile = $env:TEMP + "\Temp.csv"
#creates a blank CSV file and make sure it's in ASCI
Out-File $tempfile -Force ascii
#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force
#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query
#Reads results column by column. This way you don't have to specify how many columns it has.
$reader = $command.ExecuteReader()
while($reader.Read()) {
$props = @{}
for($i = 0; $i -lt $reader.FieldCount; $i+=1) {
$name = $reader.GetName($i)
$value = $reader.item($i)
$props.Add($name, $value)
}
#Exports each line to CSV file. Works best when the file is on local drive as it saves it after each line.
new-object PSObject -Property $props | Export-Csv $tempfile -NoTypeInformation -Append
}
Move-Item $tempfile $output -Force
$connection.Close()
理想情况下,我想使用第一个代码,因为它比第二个代码快得多,但要以某种方式避免 运行 内存不足。
你们知道是否有办法 "fill" 前 100 万条记录,将它们附加到 CSV,清理 "DataSet" table,接下来的 100 万条记录等等?代码完成后 运行 CSV 权重约为 1.3 GB,但当它运行时,即使是 8 GB 的内存也不够用(我的笔记本电脑有 8 GB,但服务器只有 4 GB,它真的很难用)。
如有任何提示,我们将不胜感激。
在 *nix 社区中,我们喜欢单行代码!
你可以在sqlplus中设置标记为'csv on'(>=12)
创建查询文件
cat > query.sql <<EOF
set head off
set feed off
set timing off
set trimspool on
set term off
spool output.csv
select
object_id,
owner,
object_name,
object_type,
status,
created,
last_ddl_time
from dba_objects;
spool off
exit;
EOF
像这样假脱机 output.csv:
sqlplus -s -m "CSV ON DELIM ',' QUOTE ON" user/password@\"localhost:1521/<my_service>\" @query.sql
另一个选项是 SQLcl(SQL Developer CLI 工具。二进制名称:'sql' 被我重命名为 'sqlcl')
创建查询文件(注意!术语开|关)
cat > query.sql <<EOF
set head off
set feed off
set timing off
set term off
set trimspool on
set sqlformat csv
spool output.csv
select
object_id,
owner,
object_name,
object_type,
status,
created,
last_ddl_time
from dba_objects
where rownum < 5;
spool off
exit;
EOF
像这样假脱机 output.csv:
sqlcl -s system/oracle@\"localhost:1521/XEPDB1\" @query.sql
中提琴!
cat output.csv
9,"SYS","I_FILE#_BLOCK#","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
38,"SYS","I_OBJ3","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
45,"SYS","I_TS1","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
51,"SYS","I_CON1","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
获胜者是 77k 行的 sqlplus! (删除了 rownum < 5 的过滤器)
time sqlcl -s system/oracle@\"localhost:1521/XEPDB1\" @query.sql
real 0m23.776s
user 0m39.542s
sys 0m1.293s
time sqlplus -s -m "CSV ON DELIM ',' QUOTE ON" system/oracle@localhost/XEPDB1 @query.sql
real 0m3.066s
user 0m0.700s
sys 0m0.265s
wc -l output.csv
77480 output.csv
您可以在 SQL Developer 中试验格式。
select /*CSV|HTML|JSON|TEXT|<TONSOFOTHERFORMATS>*/ from dba_objects;
如果您要将 CSV 加载到数据库中,这个工具就可以做到!
https://github.com/csv2db/csv2db
祝你好运!
谢谢大家的回复,我了解了 Oracle 脚本和 sql*plus,这是我以前不知道的。我将来可能会使用它们,但我想我将不得不更新我的 Oracle Developer 包。
我找到了一种方法来编辑我的代码以使用此处的文档工作:
https://docs.oracle.com/database/121/ODPNT/OracleDataAdapterClass.htm#i1002865
它并不完美,因为它每 100 万行暂停一次,保存输出并重新 运行 重新评估它的查询(我 运行ning 需要大约 1- 2分钟评估)。
这基本上与 运行 将一个代码重复 x 次(其中 x 是行数的上限,以百万为单位)执行 "fetch first 1'000'000 rows only" 然后 "Offset 1'000'00 rows Fetch next 1'000'000 rows only" 等并保存它基本相同在底部附加到 CSV 中。
代码如下:
#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT
A lot of columns
FROM
a lot of tables joined together
WHERE
a lot of conditions
"
#Oracle login credentials and other variables
$username = myusername
$password = mypassword
$datasource = TNSnameofmyDatasource
$output = "$env:USERPROFILE\desktop\Sales.csv"
#creates a blank CSV file and make sure it's in ASCII as that's what the output of my query is
Out-File $output -Force ascii
#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force
#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query
#Creates a table in memory to be filled up with results from the query using ODAC
$DataSet = New-Object System.Data.DataSet
$Adapter = New-Object Oracle.ManagedDataAccess.Client.OracleDataAdapter($command)
#Declaring variables for the loop
$fromrecord = 0
$numberofrecords = 1000000
$timesrun = 0
#Loop as long as the number of Rows in the virtual table are equal to specified $numberofrecords
while(($timesrun -eq 0) -or ($DataSet.Tables[0].Rows.Count -eq $numberofrecords))
{
$DataSet.Clear()
$Adapter.Fill($DataSet,$fromrecord,$numberofrecords,'*') | Out-Null #Suppresses writing to console the number of rows filled
Write-progress "Saved: $fromrecord Rows"
$DataSet.Tables[0] | Export-Csv $output -Append -NoTypeInformation
$fromrecord=$fromrecord+$numberofrecords
$timesrun++
}
$connection.Close()
我需要每周从 Oracle 导出一个相当大的 CSV 文件。
我尝试了两种方法。
- Adapter.fill(数据集)
- 遍历列和行以一次一行保存到 CSV 文件中。
第一个 运行 当 运行 内存不足时(服务器机器只有 4 GB RAM),第二个需要大约一个小时,因为有超过 400 万行要处理导出。
这是代码 #1:
#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT manycolumns FROM somequery"
#Oracle login credentials and other variables
$username = "username"
$password = "password"
$datasource = "database address"
$output = "\NetworkLocation\Sales.csv"
#creates a blank CSV file and make sure it's in ASCI
Out-File $output -Force ascii
#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. We usually have two versions of Oracle installed so the adaptor can be in different locations. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force
#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query
#Creates a table in memory and fills it with results from the query. Then, export the virtual table into CSV.
$DataSet = New-Object System.Data.DataSet
$Adapter = New-Object Oracle.ManagedDataAccess.Client.OracleDataAdapter($command)
$Adapter.Fill($DataSet)
$DataSet.Tables[0] | Export-Csv $output -NoTypeInformation
$connection.Close()
这是#2
#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT manycolumns FROM somequery"
#Oracle login credentials and other variables
$username = "username"
$password = "password"
$datasource = "database address"
$output = "\NetworkLocation\Sales.csv"
$tempfile = $env:TEMP + "\Temp.csv"
#creates a blank CSV file and make sure it's in ASCI
Out-File $tempfile -Force ascii
#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force
#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query
#Reads results column by column. This way you don't have to specify how many columns it has.
$reader = $command.ExecuteReader()
while($reader.Read()) {
$props = @{}
for($i = 0; $i -lt $reader.FieldCount; $i+=1) {
$name = $reader.GetName($i)
$value = $reader.item($i)
$props.Add($name, $value)
}
#Exports each line to CSV file. Works best when the file is on local drive as it saves it after each line.
new-object PSObject -Property $props | Export-Csv $tempfile -NoTypeInformation -Append
}
Move-Item $tempfile $output -Force
$connection.Close()
理想情况下,我想使用第一个代码,因为它比第二个代码快得多,但要以某种方式避免 运行 内存不足。
你们知道是否有办法 "fill" 前 100 万条记录,将它们附加到 CSV,清理 "DataSet" table,接下来的 100 万条记录等等?代码完成后 运行 CSV 权重约为 1.3 GB,但当它运行时,即使是 8 GB 的内存也不够用(我的笔记本电脑有 8 GB,但服务器只有 4 GB,它真的很难用)。
如有任何提示,我们将不胜感激。
在 *nix 社区中,我们喜欢单行代码!
你可以在sqlplus中设置标记为'csv on'(>=12)
创建查询文件
cat > query.sql <<EOF
set head off
set feed off
set timing off
set trimspool on
set term off
spool output.csv
select
object_id,
owner,
object_name,
object_type,
status,
created,
last_ddl_time
from dba_objects;
spool off
exit;
EOF
像这样假脱机 output.csv:
sqlplus -s -m "CSV ON DELIM ',' QUOTE ON" user/password@\"localhost:1521/<my_service>\" @query.sql
另一个选项是 SQLcl(SQL Developer CLI 工具。二进制名称:'sql' 被我重命名为 'sqlcl')
创建查询文件(注意!术语开|关)
cat > query.sql <<EOF
set head off
set feed off
set timing off
set term off
set trimspool on
set sqlformat csv
spool output.csv
select
object_id,
owner,
object_name,
object_type,
status,
created,
last_ddl_time
from dba_objects
where rownum < 5;
spool off
exit;
EOF
像这样假脱机 output.csv:
sqlcl -s system/oracle@\"localhost:1521/XEPDB1\" @query.sql
中提琴!
cat output.csv
9,"SYS","I_FILE#_BLOCK#","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
38,"SYS","I_OBJ3","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
45,"SYS","I_TS1","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
51,"SYS","I_CON1","INDEX","VALID",18.10.2018 07:49:04,18.10.2018 07:49:04
获胜者是 77k 行的 sqlplus! (删除了 rownum < 5 的过滤器)
time sqlcl -s system/oracle@\"localhost:1521/XEPDB1\" @query.sql
real 0m23.776s
user 0m39.542s
sys 0m1.293s
time sqlplus -s -m "CSV ON DELIM ',' QUOTE ON" system/oracle@localhost/XEPDB1 @query.sql
real 0m3.066s
user 0m0.700s
sys 0m0.265s
wc -l output.csv
77480 output.csv
您可以在 SQL Developer 中试验格式。
select /*CSV|HTML|JSON|TEXT|<TONSOFOTHERFORMATS>*/ from dba_objects;
如果您要将 CSV 加载到数据库中,这个工具就可以做到!
https://github.com/csv2db/csv2db
祝你好运!
谢谢大家的回复,我了解了 Oracle 脚本和 sql*plus,这是我以前不知道的。我将来可能会使用它们,但我想我将不得不更新我的 Oracle Developer 包。
我找到了一种方法来编辑我的代码以使用此处的文档工作: https://docs.oracle.com/database/121/ODPNT/OracleDataAdapterClass.htm#i1002865
它并不完美,因为它每 100 万行暂停一次,保存输出并重新 运行 重新评估它的查询(我 运行ning 需要大约 1- 2分钟评估)。
这基本上与 运行 将一个代码重复 x 次(其中 x 是行数的上限,以百万为单位)执行 "fetch first 1'000'000 rows only" 然后 "Offset 1'000'00 rows Fetch next 1'000'000 rows only" 等并保存它基本相同在底部附加到 CSV 中。
代码如下:
#Your query. It cannot contain any double quotes otherwise it will break.
$query = "SELECT
A lot of columns
FROM
a lot of tables joined together
WHERE
a lot of conditions
"
#Oracle login credentials and other variables
$username = myusername
$password = mypassword
$datasource = TNSnameofmyDatasource
$output = "$env:USERPROFILE\desktop\Sales.csv"
#creates a blank CSV file and make sure it's in ASCII as that's what the output of my query is
Out-File $output -Force ascii
#This here will look for "Oracle.ManagedDataAccess.dll" file inside "C:\Oracle" folder. Needs changing if the Oracle is installed elsewhere.
$location = Get-ChildItem -Path C:\Oracle -Filter Oracle.ManagedDataAccess.dll -Recurse -ErrorAction SilentlyContinue -Force
#Establishes connection to Oracle using the DLL file
Add-Type -Path $location.FullName
$connectionString = 'User Id=' + $username + ';Password=' + $password + ';Data Source=' + $datasource
$connection = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($connectionString)
$connection.open()
$command=$connection.CreateCommand()
$command.CommandText=$query
#Creates a table in memory to be filled up with results from the query using ODAC
$DataSet = New-Object System.Data.DataSet
$Adapter = New-Object Oracle.ManagedDataAccess.Client.OracleDataAdapter($command)
#Declaring variables for the loop
$fromrecord = 0
$numberofrecords = 1000000
$timesrun = 0
#Loop as long as the number of Rows in the virtual table are equal to specified $numberofrecords
while(($timesrun -eq 0) -or ($DataSet.Tables[0].Rows.Count -eq $numberofrecords))
{
$DataSet.Clear()
$Adapter.Fill($DataSet,$fromrecord,$numberofrecords,'*') | Out-Null #Suppresses writing to console the number of rows filled
Write-progress "Saved: $fromrecord Rows"
$DataSet.Tables[0] | Export-Csv $output -Append -NoTypeInformation
$fromrecord=$fromrecord+$numberofrecords
$timesrun++
}
$connection.Close()