我有 200 个印地语文本文件。想去除白色 space 特殊字符并在 python 中找到 find bigram 和 trigram
i have 200 text file in hindi. want to remove white space the special character and find the find bigram and trigram in python
import os
dir=os.getcwd()
print(dir)
dir1=os.path.join(dir,"test")
filename=os.listdir(dir1)
bad_chars = [';', ':', '!', "*","#","%"]
for i in filename:
filepath=os.path.join(dir1,i) # the path
file=open(filepath,"r",encoding="utf8") #open first text file
read_=file.read()
fields = read_.split(" ")
print(fields)
file1=open(filepath,"w",encoding="utf8")
file2=open(filepath,"a",encoding="utf8")
for j in range(len(fields)):
for p in bad_chars :
fields[j].replace(i,' ')
file2.write(fields[j])
print ("Resultant list is : " , fields[j])
file.close()
file1.close()
file2.close()
我正在尝试从所有 200 个文本文件中删除特殊字符
这是我找到的二元语法代码
例如我的名字是eshan。
输出
我的名字出现 1
名称,出现 1
是,提前发生 1
根据 text
,出现次数可以大于 1
试试这个方法:
for file in filename:
filepath=os.path.join(dir1,file)
with open('inp.txt','r+') as f:
texts = f.read()
for c in bad_chars:
texts=texts.replace(c,' ')
#write to the file
with open('inp.txt','w') as f:
f.write(texts)
import os
dir=os.getcwd()
print(dir)
dir1=os.path.join(dir,"test")
filename=os.listdir(dir1)
bad_chars = [';', ':', '!', "*","#","%"]
for i in filename:
filepath=os.path.join(dir1,i) # the path
file=open(filepath,"r",encoding="utf8") #open first text file
read_=file.read()
fields = read_.split(" ")
print(fields)
file1=open(filepath,"w",encoding="utf8")
file2=open(filepath,"a",encoding="utf8")
for j in range(len(fields)):
for p in bad_chars :
fields[j].replace(i,' ')
file2.write(fields[j])
print ("Resultant list is : " , fields[j])
file.close()
file1.close()
file2.close()
我正在尝试从所有 200 个文本文件中删除特殊字符
这是我找到的二元语法代码
例如我的名字是eshan。 输出 我的名字出现 1 名称,出现 1 是,提前发生 1 根据 text
,出现次数可以大于 1试试这个方法:
for file in filename:
filepath=os.path.join(dir1,file)
with open('inp.txt','r+') as f:
texts = f.read()
for c in bad_chars:
texts=texts.replace(c,' ')
#write to the file
with open('inp.txt','w') as f:
f.write(texts)