Batch Image Translation Demo Using Youdao OCR API with Python
This article presents a step‑by‑step Python demo that uses Youdao's OCR translation API to batch‑process cosmetic product images, covering API key setup, request parameters, signature generation, GUI implementation with Tkinter, and code snippets for file selection, result storage, and API invocation.
The author describes a personal need to translate cosmetic product labels for a girlfriend and decides to build a batch image translation tool using Youdao's OCR translation service instead of training a model from scratch.
Effect demonstration shows several screenshots of translated results for different product images, confirming that keywords and mixed language texts are correctly recognized.
API preparation explains that an application ID and secret must be created on the Youdao platform, then bound to an instance to obtain the credentials needed for API calls.
API interface description lists the endpoint ( https://openapi.youdao.com/ocrtransapi ), request method (POST), request format (form), and response format (JSON). A table of required parameters follows, including type , from , to , appKey , salt , sign , q , etc., with notes on defaults and required values.
Signature generation is detailed in a blockquote: concatenate appKey , the Base64‑encoded image q , salt , and the secret key, then compute the MD5 hash to obtain sign .
Output result describes the JSON fields returned by the API, such as orientation , lanFrom , textAngle , errorCode , lanTo , and the resRegions array containing bounding boxes, line counts, original text, and translated content.
Development process outlines three Python files: maindow.py (Tkinter GUI), transclass.py (image handling and logic), and pictranslate.py (API call). The GUI includes buttons for selecting images, choosing a result folder, and triggering translation.
Code snippets :
<code>root=tk.Tk()
root.title("netease youdao translation test")
frm = tk.Frame(root)
frm.grid(padx='50', pady='50')
btn_get_file = tk.Button(frm, text='选择待翻译图片', command=get_files)
btn_get_file.grid(row=0, column=0, ipadx='3', ipady='3', padx='10', pady='20')
text1 = tk.Text(frm, width='40', height='10')
text1.grid(row=0, column=1)
btn_get_result_path=tk.Button(frm,text='选择翻译结果路径',command=set_result_path)
btn_get_result_path.grid(row=1,column=0)
text2=tk.Text(frm,width='40',height='2')
text2.grid(row=1,column=1)
btn_sure=tk.Button(frm,text="翻译",command=translate_files)
btn_sure.grid(row=2,column=1)
root.mainloop()</code> <code>def get_files():
files = filedialog.askopenfilenames(filetypes=[('text files', '.jpg')])
translate.file_paths=files
if files:
for file in files:
text1.insert(tk.END, file + '\n')
text1.update()
else:
print('你没有选择任何文件')
</code> <code>def set_result_path():
result_path=filedialog.askdirectory()
translate.result_root_path=result_path
text2.insert(tk.END,result_path)
</code> <code>def translate_files():
if translate.file_paths:
translate.translate_files()
tk.messagebox.showinfo("提示","搞定")
else:
tk.messagebox.showinfo("提示","无文件")
</code> <code>class Translate():
def __init__(self,name,file_paths,result_root_path,trans_type):
self.name=name
self.file_paths=file_paths # 待翻译文件路径
self.result_root_path=result_root_path # 结果存放路径
self.trans_type=trans_type
def translate_files(self):
for file_path in self.file_paths:
file_name=os.path.basename(file_path)
print('==========='+file_path+'===========' )
trans_reult=self.translate_use_netease(file_path)
open(self.result_root_path+'/result_'+file_name.split('.')[0]+'.txt','w').write(trans_reult)
def translate_use_netease(self,file_content):
result= connect(file_content)
return result
</code> <code>def connect(file_content,fromLan,toLan):
f = open(file_content, 'rb') # 二进制方式打开图文件
q = base64.b64encode(f.read()).decode('utf-8') # 读取文件内容,转换为base64编码
f.close()
data = {}
data['from'] = 'auto'
data['to'] = 'auto'
data['type'] = '1'
data['q'] = q
salt = str(uuid.uuid1())
signStr = APP_KEY + q + salt + APP_SECRET
sign = encrypt(signStr)
data['appKey'] = APP_KEY
data['salt'] = salt
data['sign'] = sign
response = do_request(data)
result=json.loads(str(response.content, encoding="utf-8"))
translateResults=result['resRegions']
pictransresult=""
for i in translateResults:
pictransresult=pictransresult+i['tranContent']+"\n"
return pictransresult
</code>The conclusion remarks that leveraging an open platform made image recognition and natural language processing straightforward, allowing the developer to focus on showcasing the results rather than building models.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.