Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
JustinHsu1019 authored Nov 14, 2024
1 parent cf9eb8f commit f53f89e
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,11 @@ To set up the development environment, follow these steps:
```

9. Data preprocessing (這一階段因不同組員處理原因,OS 環境為 Windows):
- **Tesseract-OCR**
- **Tesseract-OCR**
- 下載並安裝 Tesseract-OCR。
- 安裝完成後,記下安裝路徑(如 `C:\Program Files\Tesseract-OCR\tesseract.exe`)。

- **Poppler**
- **Poppler**
- 下載並安裝 Poppler。
- 安裝完成後,記下 `poppler_path`(如 `C:\Program Files\poppler-24.08.0\Library\bin`)。

Expand All @@ -120,6 +120,8 @@ poppler_path = r"C:\Program Files\poppler-24.08.0\Library\bin"
- `競賽資料集/reference/finance/*.pdf`
- `競賽資料集/reference/insurance/*.pdf`

運行 data preprocess scripts:

```
python3 Proprocess/data_process/data_preprocess.py
python3 Preprocess/data_process/read_pdf_noocr.py
Expand Down

0 comments on commit f53f89e

Please sign in to comment.