Link TransformBlock 生成 IEnumerable<T> 以阻止接收 T
Link TransformBlock producing IEnumerable<T> to block that receives T
我正在编写一个网络画廊 scraper,我想尽可能地使用 TPL 数据流并行处理文件。
为了抓取,我首先获取图库主页并解析 HTML 以获取图片页面 links 作为列表。然后我转到列表中的每一页并解析 HTML 以获得 link 到我想保存到磁盘的图像。
这是我的程序大纲:
var galleryBlock = new TransformBlock<Uri, IEnumerable<Uri>>(async uri =>
{
// 1. Get the page
// 2. Parse the page to get the urls of each image page
return imagePageLinks;
});
var imageBlock = new TransformBlock<Uri, Uri>(async uri =>
{
// 1. Go to the url and fetch the image page html
// 2. Parse the html to retrieve the image url
return imageUri;
});
var downloadBlock = ActionBlock<Uri>(async uri =>
{
// Download the image from uri to list
});
var opts = new DataflowLinkOptions { PropagateCompletion = true};
galleryBlock.LinkTo(imageBlock, opts); // this doesn't work, as I'm returning a list and not a single Item. However I want to progress that block in parallel.
imageBlock.LinkTo(downloadBlock, opts);
您可以使用 TransformManyBlock
代替您的 TransformBlock
:
var galleryBlock = new TransformManyBlock<Uri, Uri>(async uri =>
{
return Enumerable.Empty<Uri>(); //just to get it compiling
});
var imageBlock = new TransformBlock<Uri, Uri>(async uri =>
{
return null; //just to get it compiling
});
var opts = new DataflowLinkOptions { PropagateCompletion = true };
galleryBlock.LinkTo(imageBlock, opts); // bingo!
我正在编写一个网络画廊 scraper,我想尽可能地使用 TPL 数据流并行处理文件。
为了抓取,我首先获取图库主页并解析 HTML 以获取图片页面 links 作为列表。然后我转到列表中的每一页并解析 HTML 以获得 link 到我想保存到磁盘的图像。
这是我的程序大纲:
var galleryBlock = new TransformBlock<Uri, IEnumerable<Uri>>(async uri =>
{
// 1. Get the page
// 2. Parse the page to get the urls of each image page
return imagePageLinks;
});
var imageBlock = new TransformBlock<Uri, Uri>(async uri =>
{
// 1. Go to the url and fetch the image page html
// 2. Parse the html to retrieve the image url
return imageUri;
});
var downloadBlock = ActionBlock<Uri>(async uri =>
{
// Download the image from uri to list
});
var opts = new DataflowLinkOptions { PropagateCompletion = true};
galleryBlock.LinkTo(imageBlock, opts); // this doesn't work, as I'm returning a list and not a single Item. However I want to progress that block in parallel.
imageBlock.LinkTo(downloadBlock, opts);
您可以使用 TransformManyBlock
代替您的 TransformBlock
:
var galleryBlock = new TransformManyBlock<Uri, Uri>(async uri =>
{
return Enumerable.Empty<Uri>(); //just to get it compiling
});
var imageBlock = new TransformBlock<Uri, Uri>(async uri =>
{
return null; //just to get it compiling
});
var opts = new DataflowLinkOptions { PropagateCompletion = true };
galleryBlock.LinkTo(imageBlock, opts); // bingo!