Data Engineering · Setup · ~5 min

Data Engineering track setup.

Complete the platform setup first if you haven’t already. You should have a terminal, Claude Code, Git, and a GitHub account ready.

§ Steps

1. Create your track folder

TERMINAL
mkdir -p ~/dev/data-engineering
cd ~/dev/data-engineering

2. Data engineering tools: let Claude Code do it

Open Claude Code in your track folder:

TERMINAL
claude
PROMPT
I'm setting up a data engineering environment. Please:

1. Install Python 3.11+ via Miniconda, then create a conda environment called "de"
2. Install core packages in the de environment: pandas, duckdb, dbt-core, dbt-duckdb, 
   dagster, dagster-webserver, sqlalchemy, psycopg2-binary
3. Install Docker if not already installed (or tell me how, it needs admin access)
4. Verify PostgreSQL is accessible via Docker by pulling the postgres image

After each step, verify it worked and show me the result.

Verify

Once Claude Code finishes:

TERMINAL
conda activate de
python --version
python -c "import duckdb; import dbt; import dagster; print('All packages installed')"
dbt --version
docker --version

You should see Python 3.11+, "All packages installed", a dbt version, and a Docker version.


3. Your first look

Everything is installed. Before you start Project 1, see what Claude Code can do when you point it at a data engineering problem.

PROMPT
Create two small CSV files: orders.csv (500 rows: order_id, customer_id, order_date, 
product_id, quantity, unit_price) and customers.csv (50 rows: customer_id, name, 
region, signup_date). Then build a simple dbt project that: loads both CSVs as seeds, 
creates a staging model for each, and creates a mart model that joins them into an 
order_summary with total_revenue per customer. Run dbt build and show me the results.

As you work through the track, you'll learn why a single prompt isn't enough: why that schema design might not handle slowly-changing dimensions, why those joins might produce wrong row counts, why that pipeline needs quality tests, and why a consumer would need freshness guarantees and documentation.

But for now, look at what just happened. That's the starting point.