Embedded Databases
Embedded databases are lightweight, efficient databases embedded within applications to manage data locally without requiring a separate database server. They are designed to be easy to use and integrate seamlessly into applications, making them ideal for edge, mobile, and desktop environments. Examples of popular embedded databases include:
SQLite: A self-contained, serverless, zero-configuration SQL database engine. It is widely used in mobile applications, desktop applications, and IoT devices.
DuckDB: An in-process SQL OLAP database management system. It is designed to support analytical queries, making it suitable for data analysis and scientific computing.
Apache Derby: An open-source, embedded relational database implemented entirely in Java. It is easy to embed in Java applications and offers a lightweight solution for database management.
H2: A Java-based, open-source, lightweight, embedded relational database engine. It offers fast performance and is commonly used for testing, development, and small-scale applications.
HyperSQL (HSQLDB): A relational database management system written in Java. It supports a wide range of SQL standards, provides in-memory and disk-based tables, and is often used for development, testing, and lightweight production applications.
The Need for Real-Time Replication and Consolidation
As edge computing scenarios rise, the need for real-time data replication and consolidation from embedded databases into industry-leading cloud-hosted databases has become critical. Edge, mobile, and desktop applications generate vast amounts of data that need to be analyzed, processed, and stored efficiently. Real-time replication ensures that data is always up to date, providing several benefits:
Improved Data Availability: Ensures that data is available for analytics and decision-making in real-time.
Enhanced Performance: Reduces latency by processing data closer to the source and consolidating it into powerful cloud databases for further analysis.
Challenges
Despite its importance, real-time replication and consolidation from embedded databases remain a challenging and unsolved problem due to several reasons:
Lack of Built-in Change Data Capture (CDC) Functionalities: Most embedded databases do not have built-in CDC capabilities, making it difficult to track and replicate changes in real-time.
Comprehensive Handling: It is challenging to perform replication comprehensively, covering all types of Data Definition Language (DDL) and Data Manipulation Language (DML) operations while preserving transactional semantics.
Framework Complexity: Developing a framework that supports multiple embedded databases from numerous source applications and a wide range of databases, data warehouses, and data lakes at the destination adds complexity.
SyncLite
SyncLite provides a robust solution to these challenges with its innovative approach to embedded database replication and consolidation.
SyncLite Logger: is a single Java Library (JDBC Driver): SyncLite encapsulates popular embedded databases: SQLite, DuckDB, Apache Derby, H2, HyperSQL(HSQLDB), allowing user applications to perform transactional operations on them while capturing and writing them into log files.
Staging Storage: The log files are continuously staged on a configurable staging storage such as S3, MinIO, Kafka, SFTP, etc.
SyncLite Consolidator: A Java application that continuously scans these log files from the configured staging storage, reads incoming command logs, translates them into change-data-capture logs, and applies them onto one or more configured destination databases. It includes many advanced features such as table/column/value filtering and mapping, trigger installation, fine-tunable writes, support for multiple destinations etc.
Get Started
Refer GutHub Repo for SyncLite Logger: syncliteio/SyncLiteLoggerJava: Repository to distribute SyncLite Logger for Java (github.com)
Refer DockerHub Repo for SyncLite Cosolidator: syncliteio/synclite-consolidator - Docker Image | Docker Hub
SyncLite EdgeDB aka Logger, seamlessly integrated into edge applications, diligently captures transactional activity by logging it into files. These logs are then redirected to a customizable centralized staging storage. Leveraging SyncLite Consolidator, deployed on a cloud VM or centralized host, all transaction logs are meticulously processed and replayed onto the user's chosen cloud-hosted database, data warehouse, or data lake—with support for a wide range of options.