Personal knowledge graphs are rapidly growing in popularity as benefits emerge. There are lots to chose from, but here’s why I love Obsidian.

Image for post
Image for post
The Black Tusk in Garibaldi Park, BC

Building my own knowledge graph

Roam Research was the first tool like this that I learned about — their revolutionary graph approach to note taking blew my mind a bit.

I had already heard about the concept of your “mind garden” and loved the imagery that brought forth.

What really resonated with me was the idea of connecting thoughts and events in your live, and the benefit that could bring.

Obsidian

I chose this app because it looked like the best free alternative to Roam…


Databases are like Pokémon. Gotta pass data between em’ all!

There are many methods of sending data to and from BigQuery. Some are even documented!

In this post I’ll share a simple method that I use to copy tables from Redshift into BigQuery using only SQL and command line tools.

Image for post
Image for post
The Howe Sound Crest Trail, BC

Data Synchronization

While it makes little sense “on paper” to maintain identical copies of the same dataset in separate databases, that is exactly the situation I’ve found myself in.

It’s easy for organizations to become deeply invested in specific technologies. This can be said about programming languages like Java, Python or C++ and similarly it can be said about cloud tech providers like DigitalOcean, GCP or AWS. …


In Python it’s okay to make assumptions, as long as you’re able to clean up the mess if they turn out to be wrong. In fact, this is not only okay but considered good practice.

Real life example: dogs & cats

This section is for people who want to learn about the basic try/except control structure. More experienced Pythonistas could skip down to the last section of this blog post, although they would miss one hell of an awesome example.

Image for post
Image for post

I am a “dog person”. Nothing against cats — I can definitely see the appeal. Just not for me ;)

Now imagine I’m walking down the sidewalk and some leashed animal is walking towards me, but I can’t quite tell if it’s a dog or a cat. …


In this post, I introduce the concept of dynamic DAG creation and explain the significance of Python global variables for Airflow.

What do I mean by “dynamic DAG”?

Dynamic DAG creation is important for scalable data pipeline applications.

When confined to the realm of static DAG scripts, we find ourselves duplicating code in order to create pipelines.

This duplication is undesirable because (usually) it causes an increase in code-base complexity, making DAGs more difficult to update and increasing the changes of bugs appearing.

For example, updated DAGfile code must be copied across each replicated instance, while making sure to keep the intended diffs (e.g. params, custom logic) intact. …


Image for post
Image for post

Shortcuts 🥱

Symbolic links are handy for shortcuts in your file explorer.

Maybe you have shortcuts on your desktop to your favorite folders, or maybe you symlink your active projects in your home directory for quick bash access.

For example:

ln -s /User/alex/Apps/2020/appthing /User/alex/appthing

This will create a shortcut to ~/Apps/2020/appthing directly in my home dir, i.e. cd ~/appthing

Notice how I used absolute paths when creating the symlink.

You don’t need to use absolute paths, but it’s the best way to stay out of trouble. But as you’ll see below, there are times when relative symlinks are useful.

App config files 🤔

This one is more interesting. We are going to use symlinks to expose a swappable, version controlled config collection to the application. …


If you’re like me then you try to avoid using the mouse whenever possible. This post will help with that.

Image for post
Image for post

The Action to Automate

My “normal” workflow involves opening new application windows pretty frequently.

For example, I might open a new terminal window to start working on a project, or I might open a new text editor window to jot down a note.

How this goes is that I need to hover my cursor over the icon in my dock, right click to pull up the options, and scroll up to the “New window” option

Lame right? And I’ve probably done this thousands of times in the last few years. …


Last night I fell down the rabbit hole of different ways to configure docker apps with runtime arguments.

Somehow it ended with me searching for ASCII art of a d20.. I settled on this:

Image for post
Image for post
https://asciiart.website/index.php?art=objects/dice

If you’re thinking “what does ASCII art of dice have to do with docker?” Nothing really.. nothing at all.

Moving along, let’s get right to it.

My app use case

I have a python application that’s intended to be run with numerous different configurations. In particular, each instance of the application is configured for a different client, and they each are set to run on a daily schedule.

The app structure looks something like…


Why you should think about using JSON Line format in your data processing workflow. We’ll look at some jsonl examples and discuss how I use it day to day.

Image for post
Image for post

What is the json line format?

It’s a file type specification where each line is a JSON object. Just imagine a bunch of stacked up dictionaries. Here’s an example with 4 records:

{"name": "Gilbert", "wins": [["straight", "7♣"]]}
{"name": "Alexa", "wins": [["two pair", "4♠"], ["two pair", "9♠"]]}
{"name": "May", "wins": []}
{"name": "Deloise", "wins": [["three of a kind", "5♣"]]}

This is an appealing format for data streaming. Since each field has well defined key / value pairs, each record in the data stream can be handled in isolation. …


Having clean data is important for data scientists. Oftentimes to achieve this, one needs to deal with missing data. Naively, it might seem intuitive to simply remove it, but this is often not the right answer (especially if training data is scarce).

Quiz Question

On the topic of data cleaning, we’ll be answering the following question:

You’re preparing a 3000 sample training dataset with 40 features. It has 6% of the values missing (assume these are randomly distributed). If you remove all samples that have missing data, what’s the probability that a given sample will be removed?

Try to solve it :)

Don’t read on until you’ve got an answer. …


We look at projected distribution, emission and token supply curves for the top cryptocurrencies, including Bitcoin, Ethereum, Ripple, Stellar, EOS, Litecoin, Cardano, Monero, TRON, IOTA, Dash, Ethereum Classic, NEO, Dogecoin, Nano, and BitShares.

Image for post
Image for post
Distribution curve comparison (see link below for interactive version)

Interactive Chart:

https://tagto.org/dataviz/crypto-monetary-base/

Image for post
Image for post

The source code for these charts is available on GitHub here:

This post is a snapshot of the current outlook as of January 2019. For updated charts you can download and run the source code.

I became interested in currency inflation, and monetary policy in general, while reading Saifedean’s The Bitcoin Standard. It left me feeling very uncomfortable with central banks money printing policies. Just look at what the US Fed has been up to since the 2008 recession. …

About

Alex Galea

Python Data Engineer, MSc. Physics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store