Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Byaidu committed Dec 30, 2024
2 parents 05ac241 + 596d4e3 commit b7b7e50
Show file tree
Hide file tree
Showing 7 changed files with 65 additions and 3 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ For details on how to contribute, please consult the [Contribution Guide](https:

<h2 id="updates">Updates</h2>

- [Dec. 24 2024] The translator now supports local models on [Xinference](https://github.com/xorbitsai/inference) _(by [@imClumsyPanda](https://github.com/imClumsyPanda))_
- [Dec. 19 2024] Non-PDF/A documents are now supported using `-cp` _(by [@reycn](https://github.com/reycn))_
- [Dec. 13 2024] Additional support for backend by _(by [@YadominJinta](https://github.com/YadominJinta))_
- [Dec. 10 2024] The translator now supports OpenAI models on Azure _(by [@yidasanqian](https://github.com/yidasanqian))_
Expand Down
3 changes: 2 additions & 1 deletion docs/ADVANCED.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,13 @@ pdf2zh example.pdf -li en -lo ja
We've provided a detailed table on the required [environment variables](https://chatgpt.com/share/6734a83d-9d48-800e-8a46-f57ca6e8bcb4) for each translation service. Make sure to set them before using the respective service.

| **Translator** | **Service** | **Environment Variables** | **Default Values** | **Notes** |
| -------------------- | -------------- | --------------------------------------------------------------------- | -------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|----------------------|----------------|-----------------------------------------------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Google (Default)** | `google` | None | N/A | None |
| **Bing** | `bing` | None | N/A | None |
| **DeepL** | `deepl` | `DEEPL_AUTH_KEY` | `[Your Key]` | See [DeepL](https://support.deepl.com/hc/en-us/articles/360020695820-API-Key-for-DeepL-s-API) |
| **DeepLX** | `deeplx` | `DEEPLX_ENDPOINT` | `https://api.deepl.com/translate` | See [DeepLX](https://github.com/OwO-Network/DeepLX) |
| **Ollama** | `ollama` | `OLLAMA_HOST`, `OLLAMA_MODEL` | `http://127.0.0.1:11434`, `gemma2` | See [Ollama](https://github.com/ollama/ollama) |
| **Xinference** | `xinference` | `XINFERENCE_HOST`, `XINFERENCE_MODEL` | `http://127.0.0.1:9997`, `gemma-2-it` | See [Xinference](https://github.com/xorbitsai/inference) |
| **OpenAI** | `openai` | `OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL` | `https://api.openai.com/v1`, `[Your Key]`, `gpt-4o-mini` | See [OpenAI](https://platform.openai.com/docs/overview) |
| **AzureOpenAI** | `azure-openai` | `AZURE_OPENAI_BASE_URL`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_MODEL` | `[Your Endpoint]`, `[Your Key]`, `gpt-4o-mini` | See [Azure OpenAI](https://learn.microsoft.com/zh-cn/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Cjavascript-keyless%2Ctypescript-keyless%2Cpython&pivots=programming-language-python) |
| **Zhipu** | `zhipu` | `ZHIPU_API_KEY`, `ZHIPU_MODEL` | `[Your Key]`, `glm-4-flash` | See [Zhipu](https://open.bigmodel.cn/dev/api/thirdparty-frame/openai-sdk) |
Expand Down
1 change: 1 addition & 0 deletions docs/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@

<h2 id="updates">近期更新</h2>

- [Dec. 24 2024] 翻译功能支持接入 [Xinference](https://github.com/xorbitsai/inference) 运行的本地 LLM _(by [@imClumsyPanda](https://github.com/imClumsyPanda))_
- [Nov. 26 2024] CLI 现在已支持(多个)在线 PDF 文件 *(by [@reycn](https://github.com/reycn))*
- [Nov. 24 2024] 为降低依赖大小,提供 [ONNX](https://github.com/onnx/onnx) 支持 *(by [@Wybxc](https://github.com/Wybxc))*
- [Nov. 23 2024] 🌟 [免费公共服务](#demo) 上线! *(by [@Byaidu](https://github.com/Byaidu))*
Expand Down
3 changes: 2 additions & 1 deletion pdf2zh/converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
TencentTranslator,
DifyTranslator,
AnythingLLMTranslator,
XinferenceTranslator,
ArgosTranslator,
)
from pymupdf import Font
Expand Down Expand Up @@ -149,7 +150,7 @@ def __init__(
param = service.split(":", 1)
service_name = param[0]
service_model = param[1] if len(param) > 1 else None
for translator in [GoogleTranslator, BingTranslator, DeepLTranslator, DeepLXTranslator, OllamaTranslator, AzureOpenAITranslator,
for translator in [GoogleTranslator, BingTranslator, DeepLTranslator, DeepLXTranslator, OllamaTranslator, XinferenceTranslator, AzureOpenAITranslator,
OpenAITranslator, ZhipuTranslator, ModelScopeTranslator, SiliconTranslator, GeminiTranslator, AzureTranslator, TencentTranslator, DifyTranslator, AnythingLLMTranslator, ArgosTranslator]:
if service_name == translator.name:
self.translator = translator(lang_in, lang_out, service_model, envs=envs, prompt=prompt)
Expand Down
2 changes: 2 additions & 0 deletions pdf2zh/gui.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
OpenAITranslator,
SiliconTranslator,
TencentTranslator,
XinferenceTranslator,
ZhipuTranslator,
)

Expand All @@ -41,6 +42,7 @@
"DeepL": DeepLTranslator,
"DeepLX": DeepLXTranslator,
"Ollama": OllamaTranslator,
"Xinference": XinferenceTranslator,
"AzureOpenAI": AzureOpenAITranslator,
"OpenAI": OpenAITranslator,
"Zhipu": ZhipuTranslator,
Expand Down
57 changes: 56 additions & 1 deletion pdf2zh/translator.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import deepl
import ollama
import openai
import xinference_client
import requests
from pdf2zh.cache import TranslationCache
from azure.ai.translation.text import TextTranslationClient
Expand Down Expand Up @@ -278,6 +279,57 @@ def do_translate(self, text):
raise Exception("All models failed")


class XinferenceTranslator(BaseTranslator):
# https://github.com/xorbitsai/inference
name = "xinference"
envs = {
"XINFERENCE_HOST": "http://127.0.0.1:9997",
"XINFERENCE_MODEL": "gemma-2-it",
}
CustomPrompt = True

def __init__(self, lang_in, lang_out, model, envs=None, prompt=None):
self.set_envs(envs)
if not model:
model = self.envs["XINFERENCE_MODEL"]
super().__init__(lang_in, lang_out, model)
self.options = {"temperature": 0} # 随机采样可能会打断公式标记
self.client = xinference_client.RESTfulClient(self.envs["XINFERENCE_HOST"])
self.prompttext = prompt
self.add_cache_impact_parameters("temperature", self.options["temperature"])
if prompt:
self.add_cache_impact_parameters("prompt", prompt)

def do_translate(self, text):
maxlen = max(2000, len(text) * 5)
for model in self.model.split(";"):
try:
xf_model = self.client.get_model(model)
xf_prompt = self.prompt(text, self.prompttext)
xf_prompt = [
{
"role": "user",
"content": xf_prompt[0]["content"]
+ "\n"
+ xf_prompt[1]["content"],
}
]
response = xf_model.chat(
generate_config=self.options,
messages=xf_prompt,
)

response = response["choices"][0]["message"]["content"].replace(
"<end_of_turn>", ""
)
if len(response) > maxlen:
raise Exception("Response too long")
return response.strip()
except Exception as e:
print(e)
raise Exception("All models failed")


class OpenAITranslator(BaseTranslator):
# https://github.com/openai/openai-python
name = "openai"
Expand All @@ -303,7 +355,10 @@ def __init__(
model = self.envs["OPENAI_MODEL"]
super().__init__(lang_in, lang_out, model)
self.options = {"temperature": 0} # 随机采样可能会打断公式标记
self.client = openai.OpenAI(base_url=base_url, api_key=api_key)
self.client = openai.OpenAI(
base_url=base_url or self.envs["OPENAI_BASE_URL"],
api_key=api_key or self.envs["OPENAI_API_KEY"],
)
self.prompttext = prompt
self.add_cache_impact_parameters("temperature", self.options["temperature"])
if prompt:
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ dependencies = [
"tenacity",
"numpy",
"ollama",
"xinference-client",
"deepl",
"openai",
"azure-ai-translation-text<=1.0.1",
Expand Down

0 comments on commit b7b7e50

Please sign in to comment.