ReplyOMeter/Ingest/Ingest Design.md
2024-10-27 09:11:51 -04:00

86 lines
2.3 KiB
Markdown

# Ingest Overview
The purpose of the Ingest system is to digitize the source material
The Ingestion pipeline performs the following steps for each input file:
1. Extract and normalize text
a. Identify the region in the input file with text
b. Run Handwriting Text Recogntion (HTR) to text
c. Recognize the text language
d. Provide human correction and feed back into model fine-tuning queue
2. Extract photographs
a. Identify the region in the input file with a photograph
# Technologies
## Evaluation criteria
Bellow is a prioritized list of the evaluation criteria. This is the most important part to align on before choosing the right tech. stack.
1. Accuracy: how accurate is the model at recognizing *handwritten* letters in the target languages (English, German, and Hebrew).
2. Tuning: how easy is it to tune the model based on human feedback.
3. Price: how much does it cost per run/tuning.
4. Simplicity: how much work is it to integrate with the model? For example, does it align with the tech-stack in the other systems?
LLM HuggingFace accuracy list: https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
## (A) Amazon Textract
URL: https://aws.amazon.com/pm/textract/
1. Accuracy: ?
2. Tuning: F.
3. Price: B+. ~$15 per million pages. URL: https://aws.amazon.com/textract/pricing/
4. Simplicity: B.
## (B) DocuPanda
URL: https://www.docupanda.io
1. Accuracy:
2. Tuning:
3. Price:
4. Simplicity:
## (C) Transkribus (Proposed)
URL: https://www.transkribus.org
1. Accuracy: A+
2. Tuning: A. Available via the API. URL: https://www.transkribus.org/ai-training
3. Price: B. 0-60 Euros per month. URL: https://www.transkribus.org/plans
4. Simplicity: B. Cannot detect language. metagraph API for integration. URL: https://www.transkribus.org/metagrapho
## (D) ChatGPT
1. Accuracy:
2. Tuning:
3. Price:
4. Simplicity:
## (E) LLava
URL: https://llava-vl.github.io
1. Accuracy:
2. Tuning:
3. Price: A+, Free on-device
4. Simplicity:
## (F) InternVL2-Llama3-76B
URL: https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B
1. Accuracy:
2. Tuning:
3. Price: A+, Free on-device
4. Simplicity:
## (G) Handwriting OCR
URL: https://www.handwritingocr.com
1. Accuracy:
2. Tuning: F, not available
3. Price: C, $0.06-$0.12 per page. URL: https://www.handwritingocr.com/#pricing
4. Simplicity: B, URL: https://www.handwritingocr.com/api/docs