在 Azure 数据湖中解压 .gz 文件
Decompress .gz file in Azure Data lake
如何使用 c# 解压缩和读取 Azure 数据湖中的 .gz 文件asp.net
我尝试了以下代码,但它导致异常。
Exception: Could not find a part of the path 'D:\xxxxxx\filename'.
public static void Main(string[] args)
{
// Obtain AAD token
var creds = new ClientCredential(applicationId, clientSecret);
var clientCreds = ApplicationTokenProvider.LoginSilentAsync(tenantId, creds).GetAwaiter().GetResult();
// Create ADLS client object
AdlsClient client = AdlsClient.CreateClient(adlsAccountFQDN, clientCreds);
try
{
// Enumerate directory
foreach (var entry in client.EnumerateDirectory("/Test/"))
{
try
{
string filename =entry.Name;
using (Stream fileStream = File.OpenRead(filename), zippedStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (StreamReader reader = new StreamReader(zippedStream))
{
// work with reader
reader.ReadLine();
}
}
}
catch (Exception ex)
{
}
}
}
catch (AdlsException e)
{
PrintAdlsException(e);
}
Console.WriteLine("Done. Press ENTER to continue ...");
Console.ReadLine();
}
built-in extractors(文本、Csv、Tsv)现在原生支持 gzip 文件,因此除了阅读它们外,您无需执行任何特殊操作:
@data =
EXTRACT Timestamp DateTime,
Event string,
Value int
FROM "/input/input.csv.gz"
USING Extractors.Csv();
这也适用于自定义提取器:
@data =
EXTRACT Timestamp DateTime,
Event string,
Value int
FROM "/input/input.csv.gz"
USING new USQLworking.MyExtractor();
有关 Michael Rys 的进一步说明,请参阅 here。
我得到了解决方案。
我们应该使用 File.OpenRead(filename) 而不是 client.GetReadStream(entry.FullName).
代码是:
foreach (var entry in client.EnumerateDirectory("/Test/"))
{
StringBuilder lines = new StringBuilder();
try
{
using (Stream fileStream = client.GetReadStream(entry.FullName), zippedStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (StreamReader reader = new StreamReader(zippedStream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
lines.AppendLine(line);
Console.WriteLine(lines);
}
}
}
}
如何使用 c# 解压缩和读取 Azure 数据湖中的 .gz 文件asp.net
我尝试了以下代码,但它导致异常。
Exception: Could not find a part of the path 'D:\xxxxxx\filename'.
public static void Main(string[] args)
{
// Obtain AAD token
var creds = new ClientCredential(applicationId, clientSecret);
var clientCreds = ApplicationTokenProvider.LoginSilentAsync(tenantId, creds).GetAwaiter().GetResult();
// Create ADLS client object
AdlsClient client = AdlsClient.CreateClient(adlsAccountFQDN, clientCreds);
try
{
// Enumerate directory
foreach (var entry in client.EnumerateDirectory("/Test/"))
{
try
{
string filename =entry.Name;
using (Stream fileStream = File.OpenRead(filename), zippedStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (StreamReader reader = new StreamReader(zippedStream))
{
// work with reader
reader.ReadLine();
}
}
}
catch (Exception ex)
{
}
}
}
catch (AdlsException e)
{
PrintAdlsException(e);
}
Console.WriteLine("Done. Press ENTER to continue ...");
Console.ReadLine();
}
built-in extractors(文本、Csv、Tsv)现在原生支持 gzip 文件,因此除了阅读它们外,您无需执行任何特殊操作:
@data =
EXTRACT Timestamp DateTime,
Event string,
Value int
FROM "/input/input.csv.gz"
USING Extractors.Csv();
这也适用于自定义提取器:
@data =
EXTRACT Timestamp DateTime,
Event string,
Value int
FROM "/input/input.csv.gz"
USING new USQLworking.MyExtractor();
有关 Michael Rys 的进一步说明,请参阅 here。
我得到了解决方案。 我们应该使用 File.OpenRead(filename) 而不是 client.GetReadStream(entry.FullName).
代码是:
foreach (var entry in client.EnumerateDirectory("/Test/"))
{
StringBuilder lines = new StringBuilder();
try
{
using (Stream fileStream = client.GetReadStream(entry.FullName), zippedStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (StreamReader reader = new StreamReader(zippedStream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
lines.AppendLine(line);
Console.WriteLine(lines);
}
}
}
}