Download all the PDF files from a particular web page.
%load_ext autoreload
%autoreload 2
%load_ext rich
  • get all the links from a list of pages
  • download all the PDF files from a list to a specific directory

add_scheme_and_domain[source]

add_scheme_and_domain(partial_url, main_source)

Given a partial url and a domain, return a full url.

get_pdf_links(url)

Given a url, return a list of all pdf links on the page.

download_pdf_files[source]

download_pdf_files(pdf_links, destination='.')

Given a list of pdf links, download all the pdfs.

If you wish to download the PDF files from a particular webpage, use the following pattern:

download_pdf_files(
    get_pdf_links("https://open.defense.gov/Transparency/FOIA.aspx"), "./test"
)