Embeddings
GPT4All supports generating high quality embeddings of arbitrary length text using any embedding model supported by llama.cpp.
An embedding is a vector representation of a piece of text. Embeddings are useful for tasks such as retrieval for question answering (including retrieval augmented generation or RAG), semantic similarity search, classification, and topic clustering.
Supported Embedding Models
The following models have built-in support in Embed4All:
Name | Embed4All model_name |
Context Length | Embedding Length | File Size |
---|---|---|---|---|
SBert | all‑MiniLM‑L6‑v2.gguf2.f16.gguf | 512 | 384 | 44 MiB |
Nomic Embed v1 | nomic‑embed‑text‑v1.f16.gguf | 2048 | 768 | 262 MiB |
Nomic Embed v1.5 | nomic‑embed‑text‑v1.5.f16.gguf | 2048 | 64-768 | 262 MiB |
The context length is the maximum number of word pieces, or tokens, that a model can embed at once. Embedding texts longer than a model's context length requires some kind of strategy; see Embedding Longer Texts for more information.
The embedding length is the size of the vector returned by Embed4All.embed
.
Quickstart
Generating Embeddings
By default, embeddings will be generated on the CPU using all-MiniLM-L6-v2.
You can also use the GPU to accelerate the embedding model by specifying the device
parameter. See the GPT4All
constructor for more information.
Nomic Embed
Embed4All has built-in support for Nomic's open-source embedding model, Nomic Embed. When using this model, you must
specify the task type using the prefix
argument. This may be one of search_query
, search_document
,
classification
, or clustering
. For retrieval applications, you should prepend search_document
for all of your
documents and search_query
for your queries. See the Nomic Embedding Guide for more info.
Embedding Longer Texts
Embed4All accepts a parameter called long_text_mode
. This controls the behavior of Embed4All for texts longer than the
context length of the embedding model.
In the default mode of "mean", Embed4All will break long inputs into chunks and average their embeddings to compute the final result.
To change this behavior, you can set the long_text_mode
parameter to "truncate", which will truncate the input to the
sequence length of the model before generating a single embedding.
Batching
You can send multiple texts to Embed4All in a single call. This can give faster results when individual texts are
significantly smaller than n_ctx
tokens. (n_ctx
defaults to 2048.)
The number of texts that can be embedded in one pass of the model is proportional to the n_ctx
parameter of Embed4All.
Increasing it may increase batched embedding throughput if you have a fast GPU, at the cost of VRAM.
Resizable Dimensionality
The embedding dimension of Nomic Embed v1.5 can be resized using the dimensionality
parameter. This parameter supports
any value between 64 and 768.
Shorter embeddings use less storage, memory, and bandwidth with a small performance cost. See the blog post for more info.
API documentation
Embed4All
Python class that handles embeddings for GPT4All.
Source code in gpt4all/gpt4all.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
__init__(model_name=None, *, n_threads=None, device=None, **kwargs)
Constructor
Parameters:
-
n_threads
(int | None
, default:None
) –number of CPU threads used by GPT4All. Default is None, then the number of threads are determined automatically.
-
device
(str | None
, default:None
) –The processing unit on which the embedding model will run. See the
GPT4All
constructor for more info. -
kwargs
(Any
, default:{}
) –Remaining keyword arguments are passed to the
GPT4All
constructor.
Source code in gpt4all/gpt4all.py
close()
embed(text, *, prefix=None, dimensionality=None, long_text_mode='mean', return_dict=False, atlas=False, cancel_cb=None)
Generate one or more embeddings.
Parameters:
-
text
(str | list[str]
) –A text or list of texts to generate embeddings for.
-
prefix
(str | None
, default:None
) –The model-specific prefix representing the embedding task, without the trailing colon. For Nomic Embed, this can be
search_query
,search_document
,classification
, orclustering
. Defaults tosearch_document
or equivalent if known; otherwise, you must explicitly pass a prefix or an empty string if none applies. -
dimensionality
(int | None
, default:None
) –The embedding dimension, for use with Matryoshka-capable models. Defaults to full-size.
-
long_text_mode
(str
, default:'mean'
) –How to handle texts longer than the model can accept. One of
mean
ortruncate
. -
return_dict
(bool
, default:False
) –Return the result as a dict that includes the number of prompt tokens processed.
-
atlas
(bool
, default:False
) –Try to be fully compatible with the Atlas API. Currently, this means texts longer than 8192 tokens with long_text_mode="mean" will raise an error. Disabled by default.
-
cancel_cb
(EmbCancelCallbackType | None
, default:None
) –Called with arguments (batch_sizes, backend_name). Return true to cancel embedding.
Returns:
-
Any
–With return_dict=False, an embedding or list of embeddings of your text(s).
-
Any
–With return_dict=True, a dict with keys 'embeddings' and 'n_prompt_tokens'.
Raises:
-
CancellationError
–If cancel_cb returned True and embedding was canceled.