I recently had the chance to work with Denodo, a data virtualization platform. As I’m quite new to such technology and data eng, I think its might be a good opportunity to note some learning down.
Data Virtualization
Denodo as a platform is used to virtualize data across an organization and create a single point of view. This can be useful when the data is scattered across the systems in various forms. Comparing to traditional data warehouse approach where we integrate multiple data sources into a single database, data virtualization will reduce the use of ETL as Denodo claimed and perform quick and easy analysis on different data sources.
Build a Quick Demo with Denodo
Deploy a Denodo Instance on Azure
Denodo is on Azure Marketplace, where we can simply deploy an VM with Denodo image.
After making sure all Denodo services are up and running (I did that by using RDP to connect to the windows server and inspect the Denodo platform), we can connect to the Denodo Design Studio through our browser: http://{xx:xx:xx:xx}:9090/denodo-design-studio/#/
Replace xx:xx:xx:xx with the public IP address of your Denodo VM.
Connect to Data Sources
Now we have our Denodo instance up and running, we can connect try connecting to some sample data.
I went for JDBC connection for my sample data source in Azure SQL DB
Give the connection a name, select Azure SQL as Database Adapter and update the URL of your db along with the credentials. Click the Test Connection option on the top right, u will be able to see a notification of successgul connection:
Create a Base View from Connected Data Sources
A base view defined by Denodo is a view that directly comes from a connected data source, it can be created along with connecting to a new data source.
Create Derived views
Derived views are results of some kinda of integration of other views. It can be created through the Denodo GUI by choosing any of these following options:
Or you can simply create a new VQL Shell and using the CREATE VIEW statements (which are close to SQL). I personally find it more flexible and easier to use than the designer UI.
With derived views u can publish them as an Web API for further analytic use.
Some Thoughts and Research
Now at this stage you should be able to have some grasp on how Denodo work. It created an virtualized SQL layer above different data sources and return an integrated result.
Is Denodo good? I think its definitely useful in some specific use cases. However, since Data Virtualization is not about replication of data, will querying on live systems cause any trouble?
Apart from that, I think Denodo is served more as a one-way data integration platform: it can serve the aggregated data for a reporting layer but is it possible to operate on these data if needed? Just reading through the OpenAPI documentation of manipulating Denodo views, I can only find end points for getting the data and posting new rows, which kinda makes sense as I can imagines updating an aggregated row won’t be an easy task. How will it reflect on the systems?