@mnl@hachyderm.io

dbt, mysql and terraform

I spent my work-day yesterday porting our dbt models to mysql. We previously had a mysql syncing into postgresql so that I could run dbt on the postgresql database, but of course syncing mysql into postgresql turned out to be hell.

The sync was too expensive with fivetran (which is billing by the monthly active row, and let's say that wordpress and woocommerce love to change rows thanks to their meta attribute schema). I then used pgchameleon, for which I created a nice little terraform module to make it run on ECS. It turns out that pgchameleon tries very hard to port constraints over from mysql to postgresql, but mysql is very lax at enforcing those constraints, while postgresql really isn't. This would lead to “invalid” data when doing synchronizations, especially after migrations, which were hard to infer from the logs and required manual resynchronization. Overall, too much manual work and I never managed to get the system running over long periods of time.

After leaving things be, I returned to see that dbt-mysql now seems quite full-featured, and only had to rewrite a couple of macros to make it work with mysql 5.7. The most important difference, the fact that I couldn't use WITH queries, was not a problem, since dbt makes it easy to define additional views and refer to them.

While setting up the dbt sync on our dev database, I decided to create a terraform module to set up AWS SSM passwords + mysql users + mysql grants in one single definition. This taught me a few things about terraform:

In general, terraform is something I need to learn more of. It is a functional language that seems pretty easy to use and without too many gotchas, and being able to create your own providers would be a great skill to learn, especially in order to manage infrastructure (say, electronics projects) using a gitops workflow.