Vector Database Options for AWS
In case you missed the pgai Vectorizer launch, learn why vector databases are the wrong abstraction and how you can create AI embeddings right within your PostgreSQL database.
As machine learning and AI evolve, the need for databases that can handle vector data has become increasingly critical—cue in vector databases. These databases are essential for AI development and enable efficient storage, indexing, and querying of high-dimensional vector data, which is fundamental for different AI applications, like search, recommendations, and natural language processing.
AWS provides several options for managing vector data, each with unique features, use cases, and advantages. However, identifying the best one can take time and effort. In this article, we’ll explain vector databases and discuss the options available on AWS to help you choose the right one for your needs.
What Is a Vector Database For?
A vector database is designed for storing and retrieving vector embeddings, a type of vector data representation that includes semantic information essential for AI. AI models like large language models (LLMs) generate these embeddings that encapsulate different characteristics of an object. They’re then stored in vector databases along with the metadata of each vector, which can be queried with metadata filters.
Vector databases also allow quick and efficient lookup of the nearest neighbors in the n-dimensional space. Like other data types, you need an index to query vectors efficiently, and these databases support specialized indexes for vectors. Since vectors don’t have any natural ordering like text and numbers do, the most common way to find the k-closest items to a query vector is using the kNN or k-nearest neighbors indexes and algorithms like HNSW (hierarchical navigable small world) or IVF (inverted file index) algorithms.
However, calculating the similarity between the query vector and all the other vectors in the database is computationally expensive, especially if you have extensive vector datasets. In such cases, a more efficient approach is to use the ANN—approximate nearest-neighbor—approach, which trades off accuracy for substantial speed improvements.
Vector database use cases
RAG
In RAG or retrieval-augmented generation, a vector database provides LLMs with additional context about the query they’re given. It’s used for genAI applications such as question-answer apps and chatbots.
Instead of directly providing the prompt to the LLM, engineers create vector embeddings from existing data, which they want to use to provide additional context to the LLM before it responds, such as technical specs, research data, product descriptions, etc. The resulting embeddings are saved in the database. As a result, RAG can search for documents and data sources that are most relevant to a user query and pass all that information as context to the LLM so they can use it and give a more accurate and helpful response.
Chatbot memory
Vector databases can significantly enhance chatbot functionality by serving as memory systems. Outputs from the LLM can be converted into a vector and stored as cache. When the chatbot comes across a similar query, it can use the stored answer to avoid excess LLM inference.
These stored LLM outputs can also provide context for future conversations, ensuring relevance and coherence in ongoing dialogues.
Natural language search
Natural language search uses vector databases to match the meaning of user queries with relevant items instead of just returning exact keyword matches. Data items like product descriptions and user queries are first converted into vector embeddings.
The system then looks for vectors that are semantically similar to the query vector using nearest-neighbor search algorithms. As a result, it returns items that are semantically related to the user query, even if they don’t contain the exact search terms. For instance, searching for a “warm, comfortable sweater” will return relevant product results even if those specific words aren’t in the title or description.
Data cataloging
Vector databases also significantly improve data cataloging by encoding metadata and using semantic matching to enhance data search and discovery. Labels and descriptions of data assets are first converted to vector embeddings, which the system then uses to find related data assets, improving search capabilities and discovering semantic relations and connections among data assets that traditional search methods might miss.
AI agents
Vector databases allow AI agents to quickly access and process company-specific data, enabling them to perform various tasks efficiently. For example, AI agents can analyze customer interaction data to provide insights for marketing strategies, using vector databases to find and process relevant information.
Video and image search
Vector databases can also help search through frames of videos or image collections, enhancing the capabilities for content discovery and object recognition. Plus, combining image embeddings with metadata allows for more refined searches.
Options for AWS Vector Databases
Vector databases available for AWS can be categorized into three types: standalone vector databases, Amazon RDS PostgreSQL with pgvector, and AI and vector data on Timescale Cloud. Let’s look at these three in detail.
Standalone vector databases
Standalone vector databases are specifically designed to manage and query high-dimensional vector data efficiently. They provide specialized algorithms for nearest neighbor search, such as k-nearest neighbors, which are essential for AI and machine learning applications.
An example of a native standalone vector database for AWS is Amazon OpenSearch, a managed service that supports analytic functionalities. It can handle large datasets and integrates well with other AWS services. Other notable options available on the AWS Marketplace include:
- Pinecone: A managed vector database designed for high-performance vector similarity search
- Qdrant: An open-source vector similarity search engine and database optimized for ease of use and speed
- Zilliz Cloud: A cloud-native vector database built by the creators of Milvus
- Weaviate: An open-source vector search engine that uses machine learning to scale and manage vector data efficiently
- Astra DB: A database-as-a-service built on Apache Cassandra
- Activeloop Deep Lake: A database for deep learning that stores complex data types as tensors and allows querying and retrieving large-scale vector data
While all these standalone vector databases are quite powerful, they introduce several challenges.
Extra engineering complexity
Integrating standalone vector databases into your existing data stack adds to engineering complexity. You need application-level plumbing to handle data duplication, synchronization, and updates to ensure that the vector database stays in sync with the rest of your data stack, such as analytics and relational data.
Even with cloud databases, there’s operational overhead associated with managing data across multiple platforms. First, there’s the additional cost of having another component in your architecture. Having more components is also more costly during development since there are more moving pieces that developers need to understand. But the actual cost of these additional components is maintaining and keeping them updated, which exponentially increases when you account for data governance, consistency, migrations, and backups.
Plus, the fragmentation of data across multiple platforms makes RAG applications painful. In addition to having conflicting sources of truth, users find that managing and scaling RAG apps is difficult when the data is spread across different systems.
Learning curve
Adopting and integrating a new vector database into an existing workflow means learning new systems, APIs, optimization techniques, and tools. It also means learning new query language and syntax to be able to write the correct queries, which can lead to lost time and resources.
Unsure about future development
Many standalone vector databases are relatively new, emerging due to the recent explosion in AI, with every company hoping to capitalize on it, which raises questions about the long-term viability and support of these projects. Plus, AWS has a history of releasing features quickly, sometimes without a long-term commitment. And since the database is the foundation of your AI application, choosing a database that might not be supported or developed in the future can be risky.
Amazon RDS PostgreSQL with pgvector
Amazon RDS PostgreSQL is a widely used managed database service on AWS known for its reliability, scalability, and ease of use. It comes with pgvector, a powerful PostgreSQL extension that allows you to store high-dimensional vectors in PostgreSQL tables. It has a dedicated data type for vector representation, which enables you to store and retrieve vector data efficiently.
Pgvector lets you perform similarity searches using different vector similarity metrics, such as Euclidean distance or cosine similarity. This facilitates applications like clustering and kNN search. It also integrates with SQL queries seamlessly, enabling you to combine similarity search with other aggregation or filtering operations for more complex data analysis.
Using pgvector on RDS has several benefits, especially for teams that are already familiar with PostgreSQL, including:
- Teams with prior PostgreSQL experience can quickly get started with pgvector without needing to learn a new database system, reducing the learning curve and speeding up development.
- RDS provides a fully managed database experience and takes care of backups, patching, scaling, and other operational tasks, reducing your team's maintenance burden and allowing them to focus on development.
- You can benefit from the power of vector embeddings without adopting a new database architecture, which simplifies data management and enhances existing workflows.
- RDS is a mature and well-supported platform with features like disaster recovery, automated backups, and replication, ensuring your data is safe and your applications are highly available.
Despite all these benefits, RDS PostgreSQL with pgvector has some limitations, especially when it comes to large-scale AI applications.
Scaling problems
RDS PostgreSQL might not be the best option for AI systems that require high performance (low latency) and high scale (handling millions or billions of vectors). In such scenarios, the database’s underlying structure can become a bottleneck.
Tricky pricing and cost control
Managing costs with RDS can be quite challenging, especially since the pricing model can be complex and might lead to unexpected expenses. The pricing formula builds upon database instances and storage and includes extra charges for additional features such as multi-AZ deployments, technical support, backup storage, and data transfer.
The more resources you use, the higher your cost. To estimate the pricing accurately, you’ll need to select the right instance size and consider the resources you need for your workload, which can be difficult.
Expensive dedicated support
If you want deeply consultative support to help with production on AWS, you’ll need to pay more than $5,000 per month, which totals more than $60,000 per year (while Timescale provides this for free for cloud customers). With lower tiers, you only get a community forum and, at most, receive only general advice.
AI and vector data on Timescale Cloud (pgvector, pgai, and pgvectorscale)
Timescale Cloud is not only a managed PostgreSQL optimized for time series, events, and analytics but also a managed AI product offering on Timescale’s cloud PostgreSQL platform hosted on AWS. It combines the purpose-built performance of a specialized vector database with the ease of use and familiarity of PostgreSQL and pgvector. It also enhances pgvector with higher recall, faster search, and efficient time-based filtering, making it ideal for production AI applications. Here’s why it makes for an excellent vector database option for AWS:
Single source of truth
Timescale Cloud simplifies your application stack by providing a single place for all the data that powers your AI applications, including vector embeddings, event data, relational data, and time-series data. You don’t need to add another moving part to your infrastructure just for vectors, simplifying data engineering plumbing by just using PostgreSQL. Plus, it minimizes operational complexity due to synchronization and data duplication.
High-performance search
In addition to providing HNSW and IVFFLAT indexing algorithms, Timescale enhances pgvector through pgvectorscale. This open-source PostgreSQL extension enables developers to build more scalable AI applications with higher-performance embedding search and cost-efficient storage. Pgvectorscale’s Streaming DiskANN includes support for statistical binary quantization (SBQ), a novel binary quantization method developed by researchers at Timescale.
This allows for faster vector search than Pinecone based on a dataset of 50 million Cohere embeddings (of 768 dimensions each). With pgvector and pgvectorscale, good old PostgreSQL outperformed Pinecone’s storage-optimized index (s1) with 28x lower p95 latency and 16x higher query throughput for approximate nearest neighbor queries at 99 % recall. See the pgvector vs. Pinecone benchmark for the complete results and configurations.
Time and metadata search capabilities
Timescale also optimizes hybrid time-based vector search to find the most recent vectors, vectors older than a specific date, or vectors within a particular time period. It leverages hypertables (TimescaleDB’s time-based partition system) to find recent embeddings, store and retrieve LLM responses easily, and limit the vector search by document age or time range. This is useful for applications with “live” vector data, such as events, news, videos, and social media posts, and static documents with timestamps.
Reliability
Since Timescale extends PostgreSQL, a long-standing industry standard tool, it inherits its robustness and reliability, giving you peace of mind. To enhance reliability, Timescale’s cloud platform also has one-click HA, forking, and read replication.
Data retention and cost management
Timescale also includes built-in features to manage data retention, making it easy to implement data lifecycle policies that remove what you no longer want quickly and easily without impacting your application. Deleting old data also helps you save on storage costs. You can also combine continuous aggregates with your data retention policies to automatically downsample your data.
To further reduce costs, Timescale also offers bottomless, consumption-based storage built on S3. With access to the object storage layer right from the database, you can tier data from your database to S3, store as much data as you want, and only pay for what you store. Despite this, you can still use standard SQL to query the data in S3 from within your database.
Comparing AWS Vector Databases
To bring it all together, let’s summarize the capabilities and options of these three types of AWS vector databases in a single table:
Conclusion
Vector databases are powerful tools for developing AI systems. They enable efficient storage and retrieval of high-dimensional vector data, which is crucial for various AI applications. While standalone databases are a great option, they often come with complexities that can add operational overhead and integration challenges.
Amazon RDS PostgreSQL with pgvector provides a simpler alternative and leverages the widely used PostgreSQL database with an extension specifically designed for vector data. While it’s easy to adopt, it can get quite expensive and run into scaling problems.
For those looking for enhanced performance and scalability for production AI apps, Timescale builds on pgvector to deliver high-powered vector search capabilities while maintaining the simplicity and familiarity of PostgreSQL. This is your open-source PostgreSQL stack for AI applications in the AWS cloud.
Ultimately, your exact needs will dictate which vector database is the best option for your application. But if you’re looking to experience the reliability of PostgreSQL coupled with advanced search capabilities, try Timescale for free today.