Hi, if there are say 100 pdf documents in a folder, is there a way to do a keyword search without opening each document and then searching
Hi Deepak,
You can certainly do this by using Python to extract the text from pdf and then perform the search. You can explore certain Python packages like PDFMiner and pyPdf to get started.
I hope this helps.
Thanks,
Satyapriya
Thanks Satyapriya.
Possible for you to give some links where I can easily get the implementation of PDFMiner and pyPdf?
Hi Deepak,
You can refer to this link to explore how to use PDFMiner to extract text. You can find more examples here.
Thanks,
Satyapriya
Thanks. But in these egs they refer to the document name in the code. My problem is that I have 100 different documents with .pdf extension and I want the code to do an automated keyword search in each document and the code shud atleast tell me the document name which has that keyword if not the entire sentence where that keyword is used.
Hi Deepak,
It is a little difficult to find the exact code. Here is a code to read multiple files in a directory. I hope this would help you. You can refer to this and make the necessary changes to do the keyword search. You can use an if statement to check if the keyword is present in a file.
Thanks,
Satyapriya
Thanks