vwtyler / ocr-pdf

simple pdf to text from a url using tesseract

  • Public
  • 186 runs
  • GitHub
  • License

Input

Output

Run time and cost

This model runs on CPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

OCR-PDF Project

Overview

This project extracts text from PDF files using Tesseract Optical Character Recognition (OCR). It downloads a PDF from a given URL, converts each page into an image, and then extracts the text using Tesseract OCR. The project is on Github.

Usage

Provide a url for a pdf and it will provide the text of the pdf.

License

This project is licensed under the MIT License