How I Set Up Database Migrations for my Serverless Flask App Deployed with Zappa
This blog post assumes you are familiar with
- Flask
- Flask-Migrate (alembic revisions for Flask apps)
- Zappa (serverless deployment on aws)
Data Engineer @ Meta
This blog post assumes you are familiar with
Let’s keep this short an sweet, assumptions:
Recently, I’ve been focused on developing a Chrome extension, Skater. This extenion started off as a hackathon project among friends, resulting in a scrappy, messy codebase written in vanilla js. While a lot of fun to develop at the time, revisiting and making changes without a testing framework in place has been a headache. I’ve made the decision to revisit the extension and implement a testing suite with jest. You can follow along with those updates on the jest-implement branch. Being an occasional Windows Subsystem for Linux (WSL) user, I wanted to get node and npm set up properly, so that I can switch between Mac and PC at will.
dbt (data build tool) is used to configure, structure, and visualize datbase objects including tables, views, and processes (the T part of ELT). Dbt lends itself to an event-driven architecture where we have some number of raw
tables that are routinely populated in the databse. Any tables that are downstream of these raw tables are materialized with dbt. It is an important distinction here that dbt will not help you with getting raw data into your database, but will help you model and track the flow of the data downstream.
In data warehousing, we often encounter repetitive processes that can benefit from templating. This is a simple example of creating a COPY INTO
statement using some JSON.
In DE, Idempotency is the idea that a single ETL job or process will produce the same end result regardless of how many times you re-run the job. That means that if you have a DAG that runs on 6/15/2020, then if you clear and run that DAG 1000x, your data warehouse will still hold the exact same data, no duplicates. This concept is extremely important and will save you time in the long run.
How can we quickly partition data by date in something like an S3 Bucket?
Approaching Airflow can seem a little daunting. There are a few hundred Medium articles out there telling you how to set up Airflow, write a DAG, test a “Hello, World!” ETL. Usually the there is a lot to set up before any development, experimentation, or learning can take place. While the articles are helpful, they are often just way too long.
First post.
This is a test post for this GitHub blog.