All notable changes to the OAIEvals Collector project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Elasticsearch Support: The OAIEvals Collector now includes Elasticsearch as a target for collecting and storing event data. With the new addition, the collector can connect, interact, and write event data to Elasticsearch instances.
-
[BETA]MongoDB support: The OAIEvals Collector now includes MongoDB as a target for collecting and storing event data. With the new addition, the collector can connect, interact, and write event data to MongoDB databases. The MongoDB support allows for flexible, document-based data modeling and can handle a wide variety of data types. Required environment variables:
MONGODB_URI
,MONGODB_DATABASE
,MONGODB_COLLECTION
. -
Kafka Improvements:
- Environment Variable Validation: The application now checks if the
KAFKA_BOOTSTRAP_SERVERS
environment variable exists and is valid. If not, the application will exit or alert the user that it can't proceed without this value. - Buffer Size Configuration: The size of the message buffer used for batching Kafka messages is now configurable, allowing for optimization based on the specific use case and system resources.
- Periodic Flushing of Message Buffer: A background goroutine is introduced to periodically flush the message buffer to Kafka, ensuring that messages do not remain in the buffer for an excessively long time. The interval at which the buffer is flushed is configurable.
- Exponential Backoff: Implemented exponential backoff when attempting to write messages to Kafka. This helps to handle temporary Kafka unavailability or high load scenarios by retrying failed attempts with increasing delays.
- Mutex for Message Buffer: A mutex is added to protect the message buffer from concurrent access, ensuring thread-safety when appending messages to the buffer.
- Environment Variable Validation: The application now checks if the
- Kafka Improvements:
- Batch Message Writing: The application now uses the WriteMessages function of the Kafka writer to write messages in batches, improving throughput and reducing the overhead of individual message writes.
- Shutdown Procedure: The shutdown procedure now ensures that any remaining messages in the buffer are flushed to Kafka before closing the Kafka writer. It also properly stops the background flushing goroutine.
- Kafka Writer Initialization: Fixed an issue where the Kafka writer was not properly initialized when the
KAFKA_BOOTSTRAP_SERVERS
environment variable was not set.
- Kafka support: We've added Kafka as a target for collecting and storing event data. The OAIEvals Collector can now connect, interact with Kafka, and write messages to Kafka topics at a high rate, thus handling large volumes of event data. The Kafka support includes the buffering of messages for faster ingestion and asynchronous writing. Required environment variable:
KAFKA_BOOTSTRAP_SERVERS
.
- TimescaleDB support: We've added TimescaleDB as a target for collecting and storing event data. The OAIEvals Collector can now connect and interact with TimescaleDB, providing flexibility in handling different types of event data and supporting time-series analyses.
- Loki support: We've added Loki as a target for collecting and storing log data. The OAIEvals Collector can now connect and interact with Loki, complementing the existing InfluxDB integration. This offers more flexibility in handling different types of metrics, providing context and qualitative information around numeric metrics.
- Initial release: Created OAIEvals Collector, a Go application designed to collect and store raw evaluation metrics.
- HTTP Handler & FileSystem Watcher: These components are responsible for receiving metric data.
- Integration with InfluxDB: For robust and efficient data storage, we've integrated the collector with InfluxDB to handle numeric-based time series data.
- Containerized Deployment: The application supports deployment with Docker and orchestration with Docker Compose, making it easy to use and scalable.