ReplyOMeter/Ingest/Ingest Design.md
2024-10-27 09:11:51 -04:00

2.3 KiB

Ingest Overview

The purpose of the Ingest system is to digitize the source material

The Ingestion pipeline performs the following steps for each input file:

  1. Extract and normalize text a. Identify the region in the input file with text b. Run Handwriting Text Recogntion (HTR) to text c. Recognize the text language d. Provide human correction and feed back into model fine-tuning queue
  2. Extract photographs a. Identify the region in the input file with a photograph

Technologies

Evaluation criteria

Bellow is a prioritized list of the evaluation criteria. This is the most important part to align on before choosing the right tech. stack.

  1. Accuracy: how accurate is the model at recognizing handwritten letters in the target languages (English, German, and Hebrew).
  2. Tuning: how easy is it to tune the model based on human feedback.
  3. Price: how much does it cost per run/tuning.
  4. Simplicity: how much work is it to integrate with the model? For example, does it align with the tech-stack in the other systems?

LLM HuggingFace accuracy list: https://huggingface.co/spaces/opencompass/open_vlm_leaderboard

(A) Amazon Textract

URL: https://aws.amazon.com/pm/textract/

  1. Accuracy: ?
  2. Tuning: F.
  3. Price: B+. ~$15 per million pages. URL: https://aws.amazon.com/textract/pricing/
  4. Simplicity: B.

(B) DocuPanda

URL: https://www.docupanda.io

  1. Accuracy:
  2. Tuning:
  3. Price:
  4. Simplicity:

(C) Transkribus (Proposed)

URL: https://www.transkribus.org

  1. Accuracy: A+
  2. Tuning: A. Available via the API. URL: https://www.transkribus.org/ai-training
  3. Price: B. 0-60 Euros per month. URL: https://www.transkribus.org/plans
  4. Simplicity: B. Cannot detect language. metagraph API for integration. URL: https://www.transkribus.org/metagrapho

(D) ChatGPT

  1. Accuracy:
  2. Tuning:
  3. Price:
  4. Simplicity:

(E) LLava

URL: https://llava-vl.github.io

  1. Accuracy:
  2. Tuning:
  3. Price: A+, Free on-device
  4. Simplicity:

(F) InternVL2-Llama3-76B

URL: https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B

  1. Accuracy:
  2. Tuning:
  3. Price: A+, Free on-device
  4. Simplicity:

(G) Handwriting OCR

URL: https://www.handwritingocr.com

  1. Accuracy:
  2. Tuning: F, not available
  3. Price: C, $0.06-$0.12 per page. URL: https://www.handwritingocr.com/#pricing
  4. Simplicity: B, URL: https://www.handwritingocr.com/api/docs