dbt, mysql and terraform
I spent my work-day yesterday porting our dbt models to mysql. We previously had a mysql syncing into postgresql so that I could run dbt on the postgresql database, but of course syncing mysql into postgresql turned out to be hell.
The sync was too expensive with fivetran (which is billing by the monthly active row, and let's say that wordpress and woocommerce love to change rows thanks to their meta attribute schema). I then used pgchameleon, for which I created a nice little terraform module to make it run on ECS. It turns out that pgchameleon tries very hard to port constraints over from mysql to postgresql, but mysql is very lax at enforcing those constraints, while postgresql really isn't. This would lead to “invalid” data when doing synchronizations, especially after migrations, which were hard to infer from the logs and required manual resynchronization. Overall, too much manual work and I never managed to get the system running over long periods of time.
After leaving things be, I returned to see that dbt-mysql now seems quite full-featured, and only had to rewrite a couple of macros to make it work with mysql 5.7. The most important difference, the fact that I couldn't use WITH
queries, was not a problem, since dbt makes it easy to define additional views and refer to them.
While setting up the dbt sync on our dev database, I decided to create a terraform module to set up AWS SSM passwords + mysql users + mysql grants in one single definition. This taught me a few things about terraform:
- you can refactor existing resources using
moved {}
blocks - terraform's mysql provider was deprecated, and was replaced with
petoju/mysql
petoju/mysql
was able to migrate the existing mysql resources, but I could only do so on my x86 computer (because there is not terraform mysql provider for aarcch64)- once the migration was done, I could remove the reference to the previous provider in the terraform lock file (it is kept around so that older state files could still be loaded)
In general, terraform is something I need to learn more of. It is a functional language that seems pretty easy to use and without too many gotchas, and being able to create your own providers would be a great skill to learn, especially in order to manage infrastructure (say, electronics projects) using a gitops workflow.