验证码识别-ddddocr
约 391 字大约 1 分钟
没事搞什么爬虫。还好带带弟弟OCR,救了命了
ONNX
MacOS M1 需安装 onnxruntime
brew install onnxruntime
DDDDOR
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple ddddocr
使用
import ddddocr
ocr = ddddocr.DdddOcr(show_ad=False)
with open("test.png", 'rb') as f:
image = f.read()
res = ocr.classification(image)
print(res)
报错
AttributeError: module 'PIL.Image' has no attribute 'ANTIALIAS'
新版本的 ANTIALIAS
已经被移除, 退版本即可 pip install Pillow==9.5.0
训练
PyTorch安装
PyTorch 官方 查看自己支持的版本
dddd_trainer
安装dddd_trainer
, 并初始化。 参考官方Github, 已经很详细了
命令
python app.py create ztest
python app.py cache ztest /opt/dddd-ocr/images_set
python app.py train ztest
当你出现如下错误的时候,可能是样本数据太少导致的。
Read Cache File End! Caches Num is 4.
2023-08-04 15:23:28.176 | INFO | utils.load_cache:__init__:25 -
Reading Cache File... ----> /opt/dddd-ocr/dddd_trainer/projects/ztest/cache/cache.val.tmp
2023-08-04 15:23:28.176 | INFO | utils.load_cache:__init__:30 -
Read Cache File End! Caches Num is 0.
Traceback (most recent call last):
File "app.py", line 33, in <module>
fire.Fire(App)
File "/opt/dddd-ocr/venv/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/dddd-ocr/venv/lib/python3.7/site-packages/fire/core.py", line 480, in _Fire
target=component.__name__)
File "/opt/dddd-ocr/venv/lib/python3.7/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "app.py", line 27, in train
trainer = train.Train(project_name)
File "/opt/dddd-ocr/dddd_trainer/utils/train.py", line 83, in __init__
loaders = load_cache.GetLoader(project_name)
File "/opt/dddd-ocr/dddd_trainer/utils/load_cache.py", line 147, in __init__
num_workers=0, collate_fn=self.collate_to_sparse),
File "/opt/dddd-ocr/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 344, in __init__
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/opt/dddd-ocr/venv/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 108, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
模型训练结束, 使用离线模型运行 dddd ocr 即可
官方 github 示例
ocr = ddddocr.DdddOcr(det=False, ocr=False, import_onnx_path="myproject_0.984375_139_13000_2022-02-26-15-34-13.onnx", charsets_path="charsets.json")