ReplyOMeter/Process/Process Design.md
2024-10-27 09:11:51 -04:00

45 lines
1.3 KiB
Markdown

# Process Overview
The purpose of the Process system is to extract meta-data from the ingested material in a format useful for the next stage.
The Process pipeline performs the following steps for each input entity:
1. Normalization: translate all source material into the working language (English? Hebrew?)
2. Metadata: annotate each entity with relevant metadata, such as locations, dates/times, and actors.
3. Reconcilliation: map new entity to existing entities, creating or updating the canoncial entity graph.
4. RAG preparation: chunk the data, create embeddings, store in a vector data-base
# Objects
Canonical Entity:
Type: Person
Name: [list_of_names]
References: [entities]
Metadata:
[Date/time]
[Location]
[Person]
Entity:
Type: Letter, Photograph
Metadata
Content: english_text
Raw Content: original_text
# Technologies
## Evaluation criteria
Bellow is a prioritized list of the evaluation criteria. This is the most important part to align on before choosing the right tech. stack.
Normalization: translate between languages
Metadata: extract insights from text
RAG prep: chunking, embedding, vector-DB
## (A) Llama 3.2
URL: https://www.llama.com
Supports: multilingual translation, metadata extraction, embedding