根据关键字对文件进行排序,需要一个更数据库化的解决方案
sorting files according to keywords, need a more database-y solution
我正在制作一个脚本,通过检查文件中的已知关键字将视频文件分类到文件夹中。随着关键字数量的增长失控,脚本变得非常慢,处理每个文件需要几秒钟。
@echo off
cd /d d:\videos\shorts
if /i not "%cd%"=="d:\videos\shorts" echo invalid shorts dir. && exit /b
:: auto detect folder name via anchor file
for /r %%i in (*spirit*science*chakras*) do set conspiracies=%%~dpi
if not exist "%conspiracies%" echo conscpiracies dir missing. && pause && exit /b
for /r %%i in (*modeselektor*evil*) do set musicvideos=%%~dpi
if not exist "%musicvideos%" echo musicvideos dir missing. && pause && exit /b
for %%s in (*) do set "file=%%~nxs" & set "full=%%s" & call :count
for %%v in (*) do echo can't sort "%%~nv"
exit /b
:count
set oldfile="%file%"
set newfile=%oldfile:&=and%
if not %oldfile%==%newfile% ren "%full%" %newfile%
set count=0
set words= & rem
echo "%~n1" | findstr /i /c:"music" >nul && set words=%words%, music&& set /a count+=1
echo "%~n1" | findstr /i /c:"official video" >nul && set words=%words%, official video&& set /a count+=2
set words=%words:has, =has %
set words=%words: , =%
if not %count%==0 echo "%file%" has "%words%" %count%p for music videos
set musicvideoscount=%count%
set count=0
set words= & rem
echo "%~n1" | findstr /i /c:"misinform" >nul && set words=%words%, misinform&& set /a count+=1
echo "%~n1" | findstr /i /c:"antikythera" >nul && set words=%words%, antikythera&& set /a count+=2
set words=%words:has, =has %
set words=%words: , =%
if not %count%==0 echo "%file%" has "%words%" %count%p for conspiracies
set conspiraciescount=%count%
set wanted=3
set winner=none
:loop
:: count points and set winner (in case of tie lowest in this list wins, sort accordingly)
if %conspiraciescount%==%wanted% set winner=%conspiracies%
if %musicvideoscount%==%wanted% set winner=%musicvideos%
set /a wanted+=1
if not %wanted%==15 goto loop
if not "%winner%"=="none" move "%full%" "%winner%" >nul && echo "%winner%%file%" && echo.
注意每个关键字的 "weight value"。它计算每个类别的总分,找到最大值并将文件移动到指定给该类别的文件夹。它还会显示它找到的单词,最后列出它发现无法排序的文件,以便我可以添加关键字或调整权重值。
我已将此示例中的文件夹和关键字数量减少到最低限度。完整的脚本有六个文件夹,大小为 64k,包含所有关键字(并且还在增加)。
@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "tempfile=%temp%\somename"
SET "categories=music conspiracies"
REM SET "categories=conspiracies music"
(
FOR /f "tokens=1,2,*delims=," %%s IN (q45196316.txt) DO (
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\*%%u*" 2^>nul'
) DO (
ECHO %%a^|%%s^|%%t
)
)
)>"%tempfile%"
SET "lastname="
FOR /f "tokens=1,2,*delims=|" %%a IN ('sort "%tempfile%"') DO (
CALL :resolve %%b %%c "%%a"
)
:: and the last entry...
CALL :resolve dummy 0
GOTO :EOF
:resolve
IF "%~3" equ "%lastname%" GOTO accum
:: report and reset accumulators
IF NOT DEFINED lastname GOTO RESET
SET "winner="
SET /a maxfound=0
FOR %%v IN (%categories%) DO (
FOR /f "tokens=1,2delims=$=" %%w IN ('set $%%v') DO CALL :compare %%w %%x
)
IF DEFINED winner ECHO %winner% %lastname:&=and%
:RESET
FOR %%v IN (%categories%) DO SET /a $%%v=0
SET "lastname=%~3"
:accum
SET /a $%1+=%2
GOTO :eof
:compare
IF %2 lss %maxfound% GOTO :EOF
IF %2 gtr %maxfound% GOTO setwinner
:: equal scores use categories to determine
IF DEFINED winner GOTO :eof
:Setwinner
SET "winner=%1"
SET maxfound=%2
GOTO :eof
您需要更改 sourcedir
的设置以适合您的情况。
我使用了一个名为 q45196316.txt
的文件,其中包含我的测试类别数据。
music,6,music
music,8,Official video
conspiracies,3,misinform
conspiracies,6,antikythera
missing,0,not appearing in this directory
我相信你的问题是重复执行findstr
很耗时。
此方法使用包含 category,weight,mask
行的数据文件。 categories
变量包含按优先顺序排列的类别列表(当分数相等时)
读取数据文件,将类别分配给 %%s
,将权重分配给 %%t
,将掩码分配给 %%u
,然后使用掩码进行目录扫描。这将为找到的每个名称匹配 echo
格式 name|category|weight
的临时文件添加一行。 dir
第一次扫描后好像很快。
生成的临时文件因此每个文件名+类别加上权重都有一行,因此如果文件名属于多个类别,将创建多个条目。
然后我们扫描该文件的排序版本并解析分数。
首先,如果文件名发生变化,我们可以报告最后一个文件名。这是通过比较变量 $categoryname
中的值来完成的。由于这些是按 %categories%
的顺序扫描的,因此如果分数相等,则选择列表中的第一个类别。然后重置分数并 lastname
初始化为新文件名。
然后我们将分数累加到$categoryname
所以 - 我相信会快一点。
修订
@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "tempfile=%temp%\somename"
SET "categories="rock music" music conspiracies"
REM SET "categories=conspiracies music"
:: set up sorting categories
SET "sortingcategories="
FOR %%a IN (%categories%) DO SET "sortingcategories=!sortingcategories!,%%~a"
SET "sortingcategories=%sortingcategories: =_%"
:: Create "tempfile" containing lines of name|sortingcategory|weight
(
FOR /f "tokens=1,2,*delims=," %%s IN (q45196316.txt) DO (
SET "sortingcategory=%%s"
SET "sortingcategory=!sortingcategory: =_!"
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\*%%u*" 2^>nul'
) DO (
ECHO %%a^|!sortingcategory!^|%%t^|%%s^|%%u
)
)
)>"%tempfile%"
SET "lastname="
SORT "%tempfile%">"%tempfile%.s"
FOR /f "usebackqtokens=1,2,3delims=|" %%a IN ("%tempfile%.s") DO (
CALL :resolve %%b %%c "%%a"
)
:: and the last entry...
CALL :resolve dummy 0
GOTO :EOF
:: resolve by totalling weights (%2) in sortingcategories (%1)
:: for each name (%3)
:resolve
IF "%~3" equ "%lastname%" GOTO accum
:: report and reset accumulators
IF NOT DEFINED lastname GOTO RESET
SET "winner=none"
SET /a maxfound=0
FOR %%v IN (%sortingcategories%) DO (
FOR /f "tokens=1,2delims=$=" %%w IN ('set $%%v') DO IF %%x gtr !maxfound! (SET "winner=%%v"&SET /a maxfound=%%x)
)
ECHO %winner:_= % %lastname:&=and%
:RESET
FOR %%v IN (%sortingcategories%) DO SET /a $%%v=0
SET "lastname=%~3"
:accum
SET /a $%1+=%2
GOTO :eof
我添加了一些重要的评论。
您现在可以在类别名称中包含 space - 您需要在 set catagories...
语句中引用名称(用于报告目的)。
sortingcategories
是自动派生的 - 它仅用于排序,只是名称中任何 space 被下划线替换的类别。
在创建临时文件时,类别被处理为包含下划线(排序类别),并且在解析最终位置时,下划线被删除并返回类别名称。
现在应正确处理负权重。
-- "not append *"
的进一步修订
FOR /f "tokens=1-5delims=," %%s IN (q45196316.txt) DO (
SET "sortingcategory=%%s"
SET "sortingcategory=!sortingcategory: =_!"
FOR %%z IN ("!sortingcategory!") DO (
SETLOCAL disabledelayedexpansion
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\%%~v%%u%%~w" 2^>nul'
和
向 q45196316 文件添加 2 个额外的列
music,6,music,*,*
music,8,Official video,"",*
conspiracies,3,misinform,*,*
conspiracies,6,kythera,*anti,*
missing,0,not appearing in this directory,*,*
rock music,2,metal,*,*
conspiracies,-5,negative,*,*
for /f ... %%s
现在生成包含最后两列的 %%v
和 %%w
(因为 tokens
不是 1-5
)
这些在 dir
命令中用作 %%u
的前缀和后缀。请注意,""
应该用于 nothing,因为两个连续的 ,
被解析为单个分隔符。 %%~v
中v
/w
前的~
表示remove the quotes
.
我正在制作一个脚本,通过检查文件中的已知关键字将视频文件分类到文件夹中。随着关键字数量的增长失控,脚本变得非常慢,处理每个文件需要几秒钟。
@echo off
cd /d d:\videos\shorts
if /i not "%cd%"=="d:\videos\shorts" echo invalid shorts dir. && exit /b
:: auto detect folder name via anchor file
for /r %%i in (*spirit*science*chakras*) do set conspiracies=%%~dpi
if not exist "%conspiracies%" echo conscpiracies dir missing. && pause && exit /b
for /r %%i in (*modeselektor*evil*) do set musicvideos=%%~dpi
if not exist "%musicvideos%" echo musicvideos dir missing. && pause && exit /b
for %%s in (*) do set "file=%%~nxs" & set "full=%%s" & call :count
for %%v in (*) do echo can't sort "%%~nv"
exit /b
:count
set oldfile="%file%"
set newfile=%oldfile:&=and%
if not %oldfile%==%newfile% ren "%full%" %newfile%
set count=0
set words= & rem
echo "%~n1" | findstr /i /c:"music" >nul && set words=%words%, music&& set /a count+=1
echo "%~n1" | findstr /i /c:"official video" >nul && set words=%words%, official video&& set /a count+=2
set words=%words:has, =has %
set words=%words: , =%
if not %count%==0 echo "%file%" has "%words%" %count%p for music videos
set musicvideoscount=%count%
set count=0
set words= & rem
echo "%~n1" | findstr /i /c:"misinform" >nul && set words=%words%, misinform&& set /a count+=1
echo "%~n1" | findstr /i /c:"antikythera" >nul && set words=%words%, antikythera&& set /a count+=2
set words=%words:has, =has %
set words=%words: , =%
if not %count%==0 echo "%file%" has "%words%" %count%p for conspiracies
set conspiraciescount=%count%
set wanted=3
set winner=none
:loop
:: count points and set winner (in case of tie lowest in this list wins, sort accordingly)
if %conspiraciescount%==%wanted% set winner=%conspiracies%
if %musicvideoscount%==%wanted% set winner=%musicvideos%
set /a wanted+=1
if not %wanted%==15 goto loop
if not "%winner%"=="none" move "%full%" "%winner%" >nul && echo "%winner%%file%" && echo.
注意每个关键字的 "weight value"。它计算每个类别的总分,找到最大值并将文件移动到指定给该类别的文件夹。它还会显示它找到的单词,最后列出它发现无法排序的文件,以便我可以添加关键字或调整权重值。
我已将此示例中的文件夹和关键字数量减少到最低限度。完整的脚本有六个文件夹,大小为 64k,包含所有关键字(并且还在增加)。
@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "tempfile=%temp%\somename"
SET "categories=music conspiracies"
REM SET "categories=conspiracies music"
(
FOR /f "tokens=1,2,*delims=," %%s IN (q45196316.txt) DO (
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\*%%u*" 2^>nul'
) DO (
ECHO %%a^|%%s^|%%t
)
)
)>"%tempfile%"
SET "lastname="
FOR /f "tokens=1,2,*delims=|" %%a IN ('sort "%tempfile%"') DO (
CALL :resolve %%b %%c "%%a"
)
:: and the last entry...
CALL :resolve dummy 0
GOTO :EOF
:resolve
IF "%~3" equ "%lastname%" GOTO accum
:: report and reset accumulators
IF NOT DEFINED lastname GOTO RESET
SET "winner="
SET /a maxfound=0
FOR %%v IN (%categories%) DO (
FOR /f "tokens=1,2delims=$=" %%w IN ('set $%%v') DO CALL :compare %%w %%x
)
IF DEFINED winner ECHO %winner% %lastname:&=and%
:RESET
FOR %%v IN (%categories%) DO SET /a $%%v=0
SET "lastname=%~3"
:accum
SET /a $%1+=%2
GOTO :eof
:compare
IF %2 lss %maxfound% GOTO :EOF
IF %2 gtr %maxfound% GOTO setwinner
:: equal scores use categories to determine
IF DEFINED winner GOTO :eof
:Setwinner
SET "winner=%1"
SET maxfound=%2
GOTO :eof
您需要更改 sourcedir
的设置以适合您的情况。
我使用了一个名为 q45196316.txt
的文件,其中包含我的测试类别数据。
music,6,music
music,8,Official video
conspiracies,3,misinform
conspiracies,6,antikythera
missing,0,not appearing in this directory
我相信你的问题是重复执行findstr
很耗时。
此方法使用包含 category,weight,mask
行的数据文件。 categories
变量包含按优先顺序排列的类别列表(当分数相等时)
读取数据文件,将类别分配给 %%s
,将权重分配给 %%t
,将掩码分配给 %%u
,然后使用掩码进行目录扫描。这将为找到的每个名称匹配 echo
格式 name|category|weight
的临时文件添加一行。 dir
第一次扫描后好像很快。
生成的临时文件因此每个文件名+类别加上权重都有一行,因此如果文件名属于多个类别,将创建多个条目。
然后我们扫描该文件的排序版本并解析分数。
首先,如果文件名发生变化,我们可以报告最后一个文件名。这是通过比较变量 $categoryname
中的值来完成的。由于这些是按 %categories%
的顺序扫描的,因此如果分数相等,则选择列表中的第一个类别。然后重置分数并 lastname
初始化为新文件名。
然后我们将分数累加到$categoryname
所以 - 我相信会快一点。
修订
@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "tempfile=%temp%\somename"
SET "categories="rock music" music conspiracies"
REM SET "categories=conspiracies music"
:: set up sorting categories
SET "sortingcategories="
FOR %%a IN (%categories%) DO SET "sortingcategories=!sortingcategories!,%%~a"
SET "sortingcategories=%sortingcategories: =_%"
:: Create "tempfile" containing lines of name|sortingcategory|weight
(
FOR /f "tokens=1,2,*delims=," %%s IN (q45196316.txt) DO (
SET "sortingcategory=%%s"
SET "sortingcategory=!sortingcategory: =_!"
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\*%%u*" 2^>nul'
) DO (
ECHO %%a^|!sortingcategory!^|%%t^|%%s^|%%u
)
)
)>"%tempfile%"
SET "lastname="
SORT "%tempfile%">"%tempfile%.s"
FOR /f "usebackqtokens=1,2,3delims=|" %%a IN ("%tempfile%.s") DO (
CALL :resolve %%b %%c "%%a"
)
:: and the last entry...
CALL :resolve dummy 0
GOTO :EOF
:: resolve by totalling weights (%2) in sortingcategories (%1)
:: for each name (%3)
:resolve
IF "%~3" equ "%lastname%" GOTO accum
:: report and reset accumulators
IF NOT DEFINED lastname GOTO RESET
SET "winner=none"
SET /a maxfound=0
FOR %%v IN (%sortingcategories%) DO (
FOR /f "tokens=1,2delims=$=" %%w IN ('set $%%v') DO IF %%x gtr !maxfound! (SET "winner=%%v"&SET /a maxfound=%%x)
)
ECHO %winner:_= % %lastname:&=and%
:RESET
FOR %%v IN (%sortingcategories%) DO SET /a $%%v=0
SET "lastname=%~3"
:accum
SET /a $%1+=%2
GOTO :eof
我添加了一些重要的评论。
您现在可以在类别名称中包含 space - 您需要在 set catagories...
语句中引用名称(用于报告目的)。
sortingcategories
是自动派生的 - 它仅用于排序,只是名称中任何 space 被下划线替换的类别。
在创建临时文件时,类别被处理为包含下划线(排序类别),并且在解析最终位置时,下划线被删除并返回类别名称。
现在应正确处理负权重。
-- "not append *"
的进一步修订 FOR /f "tokens=1-5delims=," %%s IN (q45196316.txt) DO (
SET "sortingcategory=%%s"
SET "sortingcategory=!sortingcategory: =_!"
FOR %%z IN ("!sortingcategory!") DO (
SETLOCAL disabledelayedexpansion
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\%%~v%%u%%~w" 2^>nul'
和
向 q45196316 文件添加 2 个额外的列
music,6,music,*,*
music,8,Official video,"",*
conspiracies,3,misinform,*,*
conspiracies,6,kythera,*anti,*
missing,0,not appearing in this directory,*,*
rock music,2,metal,*,*
conspiracies,-5,negative,*,*
for /f ... %%s
现在生成包含最后两列的 %%v
和 %%w
(因为 tokens
不是 1-5
)
这些在 dir
命令中用作 %%u
的前缀和后缀。请注意,""
应该用于 nothing,因为两个连续的 ,
被解析为单个分隔符。 %%~v
中v
/w
前的~
表示remove the quotes
.