Creating your first dataset
Last updated
Was this helpful?
Last updated
Was this helpful?
VecDB is a NoSQL database and functions similar to popular databases such as MongoDB.
You can ingest JSON into our database simply using the .
To walk you through this, we have written the following Python tutorial as an example. In the following guides, we will assume you continue from the previous point of the guides.
Here is an example in Python where we have downloaded a Python dataset.
Once you have downloaded your dataset, the next steps will be to format the dataset into JSON documents (Python dictionaries) and then insert them into VecDB.
The id attribute is important to uniquely identify the document. If you insert them without "_id" attributes, it will be automatically generated for you but this can be problematic if it accidentally errors upon insertion and you need to investigate why - so we always recommend including the id field in your documents where possible.
From here, there are 2 ways to insert them, you can either encode them and then insert them or you can insert them and then encode them in 1 go. VecDB supports both ways.
For this particular instance, since the images are stored locally, we will encode them on the fly (i.e. as we chunk through the docs and insert them, we will encode just before we insert. This is explored in the next section!