For hockey fans, it’s a familiar story. As the clock runs down in the final (3rd) period, teams losing by a goal or two will look to pull their goalie and send out an extra skater in their place. This usually results in a 5 on 6 player situation, leading to offensive pressure and generating a late game push.

This move can be effective, but it dramatically increases the chance of the opposition scoring, since they get to shoot on an empty net. Usually it’s just a matter of time until this happens, at which point it’s pretty much game…


Pandas groupby is a powerful function that groups distinct sets within selected columns and aggregates metrics from other columns accordingly.

Performing these operations results in a pivot table, something that’s very useful in data analysis.

Image for post
Image for post
Kale, flax seed, onion. Why? Read on…

Aggregating Multiple Columns

In this article, I share a technique for computing ad-hoc aggregations that can involve multiple columns. This technique is easy to use and adapt for your needs, and results in code that’s straight forward to interpret.

So what do I mean by “multiple column agg”?

Consider some function y(col1; col2, col3, …) where col1 is a discreet field to group by and the additional columns…


Personal knowledge graphs are rapidly growing in popularity as benefits emerge. There are lots to chose from, but here’s why I love Obsidian.

Image for post
Image for post
The Black Tusk in Garibaldi Park, BC

Building my own knowledge graph

Roam Research was the first tool like this that I learned about — their revolutionary graph approach to note taking blew my mind a bit.

I had already heard about the concept of your “mind garden” and loved the imagery that brought forth.

What really resonated with me was the idea of connecting thoughts and events in your live, and the benefit that could bring.

Obsidian

I chose this app because it looked like the best free alternative…


Databases are like Pokémon. Gotta pass data between em’ all!

There are many methods of sending data to and from BigQuery. Some are even documented!

In this post I’ll share a simple method that I use to copy tables from Redshift into BigQuery using only SQL and command line tools.

Image for post
Image for post
The Howe Sound Crest Trail, BC

Data Synchronization

While it makes little sense “on paper” to maintain identical copies of the same dataset in separate databases, that is exactly the situation I’ve found myself in.

It’s easy for organizations to become deeply invested in specific technologies. This can be said about programming languages like Java, Python or C++…


In Python it’s okay to make assumptions, as long as you’re able to clean up the mess if they turn out to be wrong. In fact, this is not only okay but considered good practice.

Real life example: dogs & cats

This section is for people who want to learn about the basic try/except control structure. More experienced Pythonistas could skip down to the last section of this blog post, although they would miss one hell of an awesome example.

Image for post
Image for post

I am a “dog person”. Nothing against cats — I can definitely see the appeal. Just not for me ;)

Now imagine I’m walking down the…


In this post, I introduce the concept of dynamic DAG creation and explain the significance of Python global variables for Airflow.

What do I mean by “dynamic DAG”?

Dynamic DAG creation is important for scalable data pipeline applications.

When confined to the realm of static DAG scripts, we find ourselves duplicating code in order to create pipelines.

This duplication is undesirable because (usually) it causes an increase in code-base complexity, making DAGs more difficult to update and increasing the changes of bugs appearing.

For example, updated DAGfile code must be copied across each replicated instance, while making sure to keep the intended diffs (e.g. params, custom logic)…


Image for post
Image for post

Shortcuts 🥱

Symbolic links are handy for shortcuts in your file explorer.

Maybe you have shortcuts on your desktop to your favorite folders, or maybe you symlink your active projects in your home directory for quick bash access.

For example:

ln -s /User/alex/Apps/2020/appthing /User/alex/appthing

This will create a shortcut to ~/Apps/2020/appthing directly in my home dir, i.e. cd ~/appthing

Notice how I used absolute paths when creating the symlink.

You don’t need to use absolute paths, but it’s the best way to stay out of trouble. But as you’ll see below, there are times when relative symlinks are useful.

App config files 🤔

This one is…


If you’re like me then you try to avoid using the mouse whenever possible. This post will help with that.

Image for post
Image for post

The Action to Automate

My “normal” workflow involves opening new application windows pretty frequently.

For example, I might open a new terminal window to start working on a project, or I might open a new text editor window to jot down a note.

How this goes is that I need to hover my cursor over the icon in my dock, right click to pull up the options, and scroll up to the “New window” option

Lame right? And I’ve probably done this thousands of…


Last night I fell down the rabbit hole of different ways to configure docker apps with runtime arguments.

Somehow it ended with me searching for ASCII art of a d20.. I settled on this:

Image for post
Image for post
https://asciiart.website/index.php?art=objects/dice

If you’re thinking “what does ASCII art of dice have to do with docker?” Nothing really.. nothing at all.

Moving along, let’s get right to it.

My app use case

I have a python application that’s intended to be run with numerous different configurations. In particular, each instance of the application is configured for a different client, and they each are set to run on a daily schedule.

The app…


Why you should think about using JSON Line format in your data processing workflow. We’ll look at some jsonl examples and discuss how I use it day to day.

Image for post
Image for post

What is the json line format?

It’s a file type specification where each line is a JSON object. Just imagine a bunch of stacked up dictionaries. Here’s an example with 4 records:

{"name": "Gilbert", "wins": [["straight", "7♣"]]}
{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]}
{"name": "May", "wins": []}
{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}

This is an appealing format for data streaming. Since each field has well defined key / value pairs, each…

Alex Galea

Python Data Engineer, MSc. Physics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store