A node module to extract texts from a PDF in Google Drive. It doesn't require Google Drive API, and it extracts even if "Viewers can't download, print and copy".
Install
npm i gdrive-pdfToText
import gdrivePdf from "gdrive-pdfToText";
const test = async () => {
const driveUrl =
"https://drive.google.com/file/d/1pGOAwzneiFqjRYOLvrXaR10U6sffQSCn/view?usp=sharing";
const text = await gdrivePdf.pdfToText(driveUrl);
// await gdrivePdf.pdfToText(driveUrl, "password")
console.log("🎃 Drive Url: ", driveUrl);
console.log("🎉 Extracted Text:", text);
};
test();
Clone the project
git clone https://github.com/AlecBlance/gdrive-pdfToText
Go to the project directory
cd gdrive-pdfToText
Install dependencies
npm install
Start the test
npm run dev-sample
Well, it's simple. 🙌 Every pdf in google drive are being processed in /presspage
endpoint which makes texts in pdf interactive. We just need to pass a certain id
to get the texts, which can be found in the request sent to /meta
endpoint.
It can! 🎉 As long as it can be viewed publicly. That's the good thing about this module, because it doesn't rely on Google Drive's API to restrict any process like retrieving information.
My future project relies on this text extraction feature. Google Drive won't allow me to download nor extract texts from pdfs that are only available for viewing. Even, Google Apps Script is also restrictive. Just to make my future project possible, I should overcome this hurdle. 💪😆
Well, of course! 😎 Feel free to contact me or do a PR. I'll review it and if it is good and valuable, I'll merge it right away.
Client: NodeJS
Library: fetch-retry (for timeout errors)