A simple way to extract and parse images for machine learning workflows.
%load_ext autoreload
%autoreload 2
Our base functionality is fairly simple. It must be able to the following:
- open a PDF file
- iterate through the pages of the file
- for each page, save that page with a counter str on the end as an image file
source = Path("./tryout/")
destination = Path("./tryout/processed")
extract_images_from_pdfs(source, destination, "png")
# assert say_hello("Jeremy")=="Hello Jeremy!"