Thinking Out Loud
First post.
The purpose of this blog is to capture and share the knowledge I’ve gained in the data engineering space. I have found that my GitHub consists of projects that are outside of the DE space, repos that are pet-projects or just of general interest to me. That’s not to say that I may not add more DE projects to my GitHub in the future, however, I want to use this blog to spread more knowledge than can be gathered from my portfolio’s current offerings.
Ideas for things that we can explore:
- Airflow
- Plugins, Macros, Custom Operators
- Interacting with the API
- General purpose pipeline (s3 -> data warehouse)
- How we can easily experiment and learn with Airflow
- Data Engineering
- Date partitioning (why and how?)
- Creating idempotent processes (relates to Airflow)
- Upcoming tools
- Code Journeys
- Python for DE
- Learning new tools & paradigms
- Learning Scala
- Learning Kubernetes
I find that one of the toughest things when learning some DE skillset is being able to use the technologies outside of a commerical subscription. Not everyone has their own Snowflake or AWS account. Some of these things can be done for free and I’ll make sure that any tutorial is accessible in that way.
Thanks for reading.