RAG
RAG setup module for initializing retrieval-augmented generation chains.
create_docs(project_name, chunk_size, chunk_overlap)
Create and return documents for a given project.
Loads, splits, and stores the content to be used for embedding and
retrieval. The project_name/ directory must exist under the data
directory with its knowledge base of .txt files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
project_name
|
str
|
The project identifier. |
required |
chunk_size
|
int
|
The maximum size of each text chunk. |
required |
chunk_overlap
|
int
|
The number of characters to overlap between chunks. |
required |
Returns:
| Type | Description |
|---|---|
list[Document]
|
A list of |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the |
FileNotFoundError
|
If the |
RuntimeError
|
If it fails to read any file from the knowledge base. |
Source code in ragbot\rag.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
setup(project_name, llm_provider, llm, llm_temperature, llm_top_p, llm_top_k, embeddings_provider, embedding_model, chunk_size, chunk_overlap, search_type, k_docs)
Set up and return a RAG retrieval chain.
Initializes a language model and embeddings, chunks input documents, and creates a retrieval-augmented generation chain using LangChain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
project_name
|
str
|
Name of the project.. |
required |
llm_provider
|
str
|
Provider name for the language model (e.g., "google", "ollama"). |
required |
llm
|
str
|
Identifier or model name for the language model. |
required |
llm_temperature
|
float
|
Sampling temperature for language generation. |
required |
llm_top_p
|
float
|
Nucleus sampling top-p value. |
required |
llm_top_k
|
int
|
Top-k sampling value. |
required |
embeddings_provider
|
str
|
Provider name for embeddings. |
required |
embedding_model
|
str
|
Identifier for the embeddings model. |
required |
chunk_size
|
int
|
Maximum number of characters per document chunk. |
required |
chunk_overlap
|
int
|
Number of overlapping characters between chunks. |
required |
search_type
|
str
|
Type of search for the retriever (e.g., "similarity", "mmr"). |
required |
k_docs
|
int
|
Number of top documents to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
Runnable
|
A |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If there is no |
RuntimeError
|
If the system prompt file cannot be read, or if it does not contain the required |
Source code in ragbot\rag.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |