1.3 KiB
1.3 KiB
Process Overview
The purpose of the Process system is to extract meta-data from the ingested material in a format useful for the next stage.
The Process pipeline performs the following steps for each input entity:
- Normalization: translate all source material into the working language (English? Hebrew?)
- Metadata: annotate each entity with relevant metadata, such as locations, dates/times, and actors.
- Reconcilliation: map new entity to existing entities, creating or updating the canoncial entity graph.
- RAG preparation: chunk the data, create embeddings, store in a vector data-base
Objects
Canonical Entity: Type: Person Name: [list_of_names] References: [entities]
Metadata: [Date/time] [Location] [Person]
Entity: Type: Letter, Photograph Metadata Content: english_text Raw Content: original_text
Technologies
Evaluation criteria
Bellow is a prioritized list of the evaluation criteria. This is the most important part to align on before choosing the right tech. stack.
Normalization: translate between languages Metadata: extract insights from text RAG prep: chunking, embedding, vector-DB
(A) Llama 3.2
Supports: multilingual translation, metadata extraction, embedding