Where Are the Data Science Jobs in Germany? 🇩🇪

Data Science
Jobs
Germany
Mapping

We explore which German cities are hiring data scientists, using job listing data from the Adzuna API and visualizing the distribution using geopandas and plotly.

Author

Flavia Felletti

Published

June 5, 2025

Keywords

Adzuna, Data Science, Germany, Job Market, API, geopandas, plotly, data visualization

Introduction

In this project, explore data science job opportunities across Germany. Our main goal is to explore which cities offer the most data science positions. To do this, we collected job listings using the Adzuna API, focusing on postings within the broader IT job category. Adzuna is a job search engine that aggregates listings from thousands of websites to help users find employment opportunities in various locations and industries. It provides salary insights, hiring trends, and other services. By filtering for roles that include terms like data science, data analyst, and machine learning, we created a dataset tailored to the evolving data science job market in Germany. After cleaning the data, we will build an interactive map using geopandas and plotly, which allows the reader to easily see which states and cities offer the most opportunities.

Note: We can only extract a maximum of 1000 entries freely, and as we are starting from the broader IT category, we will exclude many jobs that are not related to data science.

📊 Load Libraries and Data

Code
# libraries
import pandas as pd
import numpy as np
import ast
import re
import requests
import json

# geopandas
import geopandas as gpd
from unidecode import unidecode

# plotly
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

# load data
df = pd.read_csv("adzuna_data_science_jobs_1.csv")
df.head(3)
latitude location id category created adref company salary_is_predicted longitude __CLASS__ contract_time redirect_url description title contract_type salary_min salary_max
0 52.507706 {'display_name': 'Berlin, Deutschland', 'area': ['Deutschland', 'Berlin'], '__CLASS__': 'Adzuna::API:... 5235945980 {'label': 'IT-Stellen', 'tag': 'it-jobs', '__CLASS__': 'Adzuna::API::Response::Category'} 2025-06-06T11:59:23Z eyJhbGciOiJIUzI1NiJ9.eyJzIjoidm9UMnhNSkM4Qkc3SnY2cFMzY0hDUSIsImkiOiI1MjM1OTQ1OTgwIn0.-MJ0xNHgRufgAxgb... {'__CLASS__': 'Adzuna::API::Response::Company', 'display_name': 'Amazon'} 0 13.420806 Adzuna::API::Response::Job full_time https://www.adzuna.de/details/5235945980?utm_medium=api&utm_source=a4bcf954 Do you want to solve business challenges through innovative technology? Do you enjoy working on cutti... 2025 Software Development Engineer - Machine Learning (m/w/d) NaN NaN NaN
1 48.137418 {'display_name': 'München, München (Kreis)', 'area': ['Deutschland', 'Bayern', 'München (Kreis)', 'Mü... 5235921302 {'label': 'IT-Stellen', 'tag': 'it-jobs', '__CLASS__': 'Adzuna::API::Response::Category'} 2025-06-06T11:18:47Z eyJhbGciOiJIUzI1NiJ9.eyJzIjoidm9UMnhNSkM4Qkc3SnY2cFMzY0hDUSIsImkiOiI1MjM1OTIxMzAyIn0.c8Ovo7vPoqKlbugZ... {'__CLASS__': 'Adzuna::API::Response::Company', 'display_name': 'OfferZen'} 0 11.555737 Adzuna::API::Response::Job NaN https://www.adzuna.de/details/5235921302?utm_medium=api&utm_source=a4bcf954 Senior Net Programmer Are you an EU citizen looking for the next big move in your developer career? O... Senior Net Programmer NaN NaN NaN
2 52.507706 {'__CLASS__': 'Adzuna::API::Response::Location', 'area': ['Deutschland', 'Berlin'], 'display_name': '... 5235921295 {'label': 'IT-Stellen', 'tag': 'it-jobs', '__CLASS__': 'Adzuna::API::Response::Category'} 2025-06-06T11:18:46Z eyJhbGciOiJIUzI1NiJ9.eyJpIjoiNTIzNTkyMTI5NSIsInMiOiJ2b1QyeE1KQzhCRzdKdjZwUzNjSENRIn0.NjBMHPOsEBjNxEp6... {'display_name': 'OfferZen', '__CLASS__': 'Adzuna::API::Response::Company'} 0 13.420806 Adzuna::API::Response::Job NaN https://www.adzuna.de/details/5235921295?utm_medium=api&utm_source=a4bcf954 Junior Back-End Developer Are you an EU citizen looking for the next big move in your developer caree... Junior Back-End Developer NaN NaN NaN

Dimensions of the dataset:

Code
df.shape
(1000, 17)

Examples of job titles:

Code
df.title.unique()[:5]
array(['2025 Software Development Engineer - Machine Learning (m/w/d)',
       'Senior Net Programmer', 'Junior Back-End Developer',
       'Junior .NET Developer', 'Intermediate Back-End Developer'],
      dtype=object)

Example of job description:

Code
df.description.iloc[0]
'Do you want to solve business challenges through innovative technology? Do you enjoy working on cutting-edge, scalable services technology in a team environment? Do you like working on industry-defining projects that move the needle? At Amazon, we hire the best minds in technology to innovate and build on behalf of our customers. The intense focus we have on our customers is why we are one of the world’s most beloved brands – customer obsession is part of our company DNA. Our Software Developme…'

Example of (raw) job locations:

Code
df.location.unique()[:5]
array(["{'display_name': 'Berlin, Deutschland', 'area': ['Deutschland', 'Berlin'], '__CLASS__': 'Adzuna::API::Response::Location'}",
       "{'display_name': 'München, München (Kreis)', 'area': ['Deutschland', 'Bayern', 'München (Kreis)', 'München'], '__CLASS__': 'Adzuna::API::Response::Location'}",
       "{'__CLASS__': 'Adzuna::API::Response::Location', 'area': ['Deutschland', 'Berlin'], 'display_name': 'Berlin, Deutschland'}",
       "{'display_name': 'Berlin, Deutschland', '__CLASS__': 'Adzuna::API::Response::Location', 'area': ['Deutschland', 'Berlin']}",
       "{'area': ['Deutschland', 'Berlin'], '__CLASS__': 'Adzuna::API::Response::Location', 'display_name': 'Berlin, Deutschland'}"],
      dtype=object)

🗺️ Extracting and Cleaning State (Bundesland) and City Information

Each job posting in the Adzuna dataset includes a location field, which contains a nested list of geographic areas. We extracted the state (Bundesland) and city name from here. However, the raw data sometimes included inaccurante or incomplete data, which require soem cleaning.

Code
# Convert the stringified dictionaries into real dictionaries
df['location'] = df['location'].apply(ast.literal_eval)

# Extract 'state' and 'city' from the nested location dictionary
def extract_state(location):
    try:
        return location['area'][1]  # 0 is 'Deutschland', 1 is the state
    except (TypeError, IndexError, KeyError):
        return None

def extract_city(location):
    try:
        # Return the most specific city-level info
        return location['area'][-1]  # The last item is usually the city or district
    except (TypeError, IndexError, KeyError):
        return None

# Apply to your dataframe
df['state'] = df['location'].apply(extract_state)
df['city'] = df['location'].apply(extract_city)

🧹 Cleaning State Information

Adzuna location data often include a region or Bundesland. We extracted this value and dropped the entries where this value was missing.

List of states:

Code
# Check states
df.state.unique()

# check why some location is None
pd.options.display.max_colwidth = 105
df[df["state"].isnull()].location.head()

# remove entries where the state is None
df = df[df['state'].notna()]
df.state.unique()
array(['Berlin', 'Bayern', 'Baden-Württemberg', 'Nordrhein-Westfalen',
       'Niedersachsen', 'Hessen', 'Schleswig-Holstein', 'Hamburg',
       'Bremen', 'Brandenburg', 'Mecklenburg-Vorpommern',
       'Sachsen-Anhalt', 'Thüringen', 'Rheinland-Pfalz', 'Sachsen',
       'Saarland'], dtype=object)

🧹 Clean the City

Cleaning the city information required a more substantial work, as - beside deleting entries with missing or incorrect city information (e.g., “Deutschland” or “Bayern”) - we had to manually create a mapping of the cities and check which city names effectively belong to cities, and which are instead name of city districts (e.g., “Hafen City”, which is a district of Hamburg).

Examples of city names:

Code
df.city.unique()[:20]
array(['Berlin', 'München', 'Böckingen', 'Donaustetten', 'Würden', 'Bult',
       'Mitte', 'Kluftern', 'Rheinfelden (Baden)', 'Arheilgen',
       'Frankfurt am Main', 'Schweinfurt', 'Reutlingen (Kreis)',
       'Höchberg', 'Köln', 'Kiel', 'Pliezhausen', 'Hersbruck',
       'Region Hannover (Kreis)', 'Münster'], dtype=object)
Show city mapping
# Cleaning cities
city_mapping = {
    'Böckingen': 'Heilbronn',
    'Würden': 'Gummersbach',
    'Dirrfelden': 'Roggenburg',
    'Milbertshofen-Am Hart': 'München',
    'Hamburg-Altstadt': 'Hamburg',
    'Gallus': 'Frankfurt am Main',
    'Laim': 'München',
    'Altstadt-Süd': 'Köln',
    'HafenCity': 'Hamburg',
    'Hagsfeld': 'Karlsruhe',
    'Hulsberg': 'Bremen',
    'Holtenau': 'Kiel',
    'Freisenbruch': 'Essen',
    'Altstadt-Nord': 'Köln',
    'Altona-Altstadt': 'Hamburg',
    'Berliner Vorstadt': 'Potsdam',
    'Sachsenhausen': 'Frankfurt am Main',
    'Berg am Laim': 'München',
    'Harvestehude': 'Hamburg',
    'Maxvorstadt': 'München',
    'Friedrichshain': 'Berlin',
    'Brinckmansdorf': 'Rostock',
    'Dornbusch': 'Frankfurt am Main',
    'Kreuzberg': 'Berlin',
    'Altstadt-Lehel': 'München',
    'Kramersfeld': 'Bamberg',
    'Wedding': 'Berlin',
    'Wilhelmsburg': 'Hamburg',
    'Frohnhausen': 'Essen',
    'Sonnenberg': 'Wiesbaden',
    'Friedrichstadt': 'Düsseldorf',
    'Ludwigsvorstadt-Isarvorstadt': 'München',
    'Bad Godesberg': 'Bonn',
    'Heerdt': 'Düsseldorf',
    'Aubing-Lochhausen-Langwied': 'München',
    'Altendorf': 'Essen',
    'Centrum': 'Dortmund',
    'Frillendorf': 'Essen',
    'Isernhagen-Süd': 'Isernhagen',
    'Moabit': 'Berlin',
    'Benhausen': 'Paderborn',
    'Eicken': 'Mönchengladbach',
    'Weingartshof': 'Ulm',
    'Sendling': 'München',
    'Leutzsch': 'Leipzig',
    'Günterstal': 'Freiburg im Breisgau',
    'Rennweg': 'Nürnberg',
    'Gebersdorf': 'Nürnberg',
    'Eil': 'Köln',
    'Düsseltal': 'Düsseldorf',
    'Ermingen': 'Ulm',
    'Gremmendorf': 'Münster',
    'Derendorf': 'Düsseldorf',
    'Geist': 'Münster',
    'Baumheide': 'Bielefeld',
    'Brünninghausen': 'Dortmund',
    'Neuostheim': 'Mannheim',
    'Gleißbühl': 'Nürnberg',
    'Handelshäfen': 'Duisburg',
    'Malstatt': 'Saarbrücken',
    'Oberstadt': 'Mainz',
    'Au-Haidhausen': 'München',
    'Friedrichsfelde': 'Berlin',
    'Bärenkeller': 'Augsburg',
    'Coerde': 'Münster',
    'Braunsfeld': 'Köln',
    'Elberfeld': 'Wuppertal',
    'Nordend-Ost': 'Frankfurt am Main',
    'Antonsviertel': 'Augsburg',
    'Walheim': 'Aachen',
    'Sieker': 'Bielefeld',
    'Zwillingshof': 'Halle (Saale)',
    'Laurensberg': 'Aachen',
    'Fahrlach': 'Mannheim',
    'Donaustetten': 'Ulm',
    'Bult': 'Hannover',
    'Mitte': 'Berlin',  # defaulting Mitte to Berlin
    'Kluftern': 'Friedrichshafen',
    'Arheilgen': 'Darmstadt',
    'Reutlingen (Kreis)': 'Reutlingen',
    'Region Hannover (Kreis)': 'Hannover',
    'Region Hannover': 'Hannover',
    'Altstadt': None,  # ambiguous
    'Innenstadt': None,  # ambiguous
    'Zentrum': None,  # ambiguous
    'Kassel (Kreis)': 'Kassel',
    'Neukölln': 'Berlin',
    'Friedrichsgabe': 'Norderstedt',
    'Kleinzschocher': 'Leipzig',
    'Gellershagen': 'Bielefeld',
    'Garching bei München': 'München',
    'Kronberg im Taunus': 'Kronberg',
    'Nordrhein-Westfalen': None,  # state, not city
    'Freiburg (Elbe)': 'Freiburg im Breisgau'
}
Show removed cities and suffixes
# Cities to remove
to_remove = ['Deutschland', 'Bayern']

# Remove cities "Deutschland" and "Bayern"
df = df[~df['city'].isin(to_remove)].copy()

# Remove "(Kreis)" suffixes
df['city'] = df['city'].str.replace(r'\s*\(Kreis\)', '', regex=True)
Code
# Use the mapping to change the city name and remove entries where the city is None
def clean_city(city):
    # If city is in mapping and mapped to None, remove it (return None)
    if city in city_mapping:
        return city_mapping[city]  # Could be None
    else:
        return city  # Keep as is if not in mapping

df['city'] = df['city'].apply(clean_city)

# Drop rows where city became None after mapping (ambiguous or unwanted)
df = df[df['city'].notna()].reset_index(drop=True)

💼 Extract and Categorize Data Science Jobs

Code
# define keywords taht we will use toe xtract data science jobs
data_roles_keywords = [
    'data scientist', 'data science', 'data analyst', 'data analytics',
    'data engineer', 'data engineering', 'analytics',
    'machine learning', 'ml', 'ai', 'artificial intelligence',
    'deep learning', 'business intelligence', 'etl', 'sql',
    'nlp', 'statistical', 'statistics', 'snowflake', 'sas',
    'data mining', 'data architecture', 'big data', 'predictive modeling',
    'computer vision', 'power bi'
]

# Build a non-capturing regex pattern
pattern = r'\b(?:' + '|'.join(re.escape(term) for term in data_roles_keywords) + r')\b'

# exclude promt evaluator
df = df[~df['title'].str.contains("Prompt Evaluator.*Hebrew|Prompt Evaluator.*Croatian", case=False, na=False)]

# Apply filtering
df = df[df['title'].str.lower().str.contains(pattern, flags=re.IGNORECASE, regex=True)]
Code
# categorize jobs
def categorize_job_title(title):
    title = title.lower()
    
    if any(keyword in title for keyword in ['data scientist', 'statistician', 'ml researcher']):
        return 'Data Scientist'
    
    elif any(keyword in title for keyword in ['machine learning', 'ml engineer', 'ai engineer', 'artificial intelligence', 'computer vision', 'nlp', 'genai', 'llm', 'deep learning', 'prompt engineer']):
        return 'Machine Learning / AI Engineer'
    
    elif any(keyword in title for keyword in ['data analyst', 'Datenanalyst', 'business intelligence', 'bi', 'analytics manager', 'analytics engineer', 'product analyst', 'marketing insights', 'web analytics']):
        return 'Data Analyst / BI'
    
    elif any(keyword in title for keyword in ['data engineer', 'etl', 'elt', 'ml ops', 'mlops', 'cloud data', 'big data', 'pipeline', 'migrations']):
        return 'Data Engineer / MLOps'
    
    else:
        return 'Other'

# Apply it to your dataframe
df['job_category'] = df['title'].apply(categorize_job_title)

df[['title', 'job_category']].head(5)
title job_category
0 2025 Software Development Engineer - Machine Learning (m/w/d) Machine Learning / AI Engineer
9 Bachelor Data Science & KI (m/w/d) – Duales Studium Other
10 Duale Studenten* Data Science & künstliche Intelligenz – Bachelor Other
11 Data Scientist (m/w/x) Data Scientist
17 Softwareentwickler/in KI / Machine Learning (w/m/d) Machine Learning / AI Engineer

Drop irrelevant columns

As we will not use most of the columns in our analysis, and we have already extracted the relevant data from the location and title, we can drop some columns:

Code
# Drop irrelevant columns
columns_to_drop = ['category', '__CLASS__', 'adref', 'location']
df = df.drop(columns=columns_to_drop, errors='ignore')

#df.head()
df.shape
(180, 16)

We have now 180 remaining entries, which are sufficient for the purpose of our analysis.

🗺️ Creating an Interactive Map of Data Science Jobs in Germany

To visualize where data science jobs are located across Germany, I created an interactive map combining:

  • ✅ A choropleth map showing job counts by federal state
  • ✅ A scatter plot displaying job locations by city (based on the coordinates)

🧰 Tools Used

  • GeoPandas to handle German states GeoJSON
  • Plotly for interactive mapping
  • SimpleMaps data for states and city coordinates, freely available here:

🧹 Data Cleaning & Preparation

  1. Counted job postings per state and per city.
  2. Loaded a GeoJSON file (de.json) for German states.
  3. Cleaned and normalized city names
  4. Mapped cleaned city names to coordinates using a reference file (de.csv) from SimpleMaps.
  5. Filled in missing coordinates using known data from the original dataset when needed.

Note: we did not directly used the coordinate available in the original dataset because for some cities multiple coordinates were available, creating repeted city names on the graphic.


📍 Plotting with Plotly

  • State-level choropleth was created with go.Choropleth, linked to the GeoJSON features.
  • City-level scatter plot used go.Scattergeo, with hover labels showing job counts.
  • Both were combined in a single graphic.
Code
pio.renderers.default = "iframe_connected"  


# Load GeoJSON of German states
germany_states_gdf = gpd.read_file("de.json")

# Load city coordinates from simplemaps CSV
simplemaps_df = pd.read_csv("de.csv")  # with columns: 'city', 'lat', 'lng'

# Clean city names for matching
def clean_city(name):
    name = re.sub(r'\s*\(.*?\)', '', name)  # remove parentheses content
    return unidecode(name.lower().strip())  # normalize umlauts, lower case, strip spaces

df['city_clean'] = df['city'].apply(clean_city)
simplemaps_df['city_clean'] = simplemaps_df['city'].apply(clean_city)

# Manual city name mapping for identified mismatches
manual_city_map = {
    'muenchen': 'munich',
    'koeln': 'cologne',
    'nurnberg': 'nuremberg',
    'frankfurt am main': 'frankfurt',
}
df['city_clean_mapped'] = df['city_clean'].replace(manual_city_map)

# Merge city coordinates
merged_df = df.merge(
    simplemaps_df[['city_clean', 'lat', 'lng']],
    left_on='city_clean_mapped',
    right_on='city_clean',
    how='left'
)

# Fill missing lat/lng from original df
merged_df['latitude'] = np.where(merged_df['lat'].notna(), merged_df['lat'], merged_df.get('latitude'))
merged_df['longitude'] = np.where(merged_df['lng'].notna(), merged_df['lng'], merged_df.get('longitude'))

# drop unnecessary columsn to avoid confusion
merged_df = merged_df[['city', 'state', 'latitude', 'longitude']]

# Prepare state counts for choropleth
state_counts = merged_df['state'].value_counts().reset_index()
state_counts.columns = ['state', 'count']

# Merge with GeoDataFrame to ensure all states appear (even if there are no jobs)
all_states = germany_states_gdf[['name']].rename(columns={'name': 'state'})
complete_counts = all_states.merge(state_counts, on='state', how='left').fillna(0)
complete_counts['count'] = complete_counts['count'].astype(int)

merged_states = germany_states_gdf.merge(complete_counts, left_on='name', right_on='state', how='left')

# Convert GeoDataFrame to GeoJSON for Plotly
germany_geojson = json.loads(merged_states.to_json())

# Aggregate city job counts and average coordinates
if 'count' not in merged_df.columns:
    merged_df['count'] = 1  # each row = one job posting

city_grouped = (
    merged_df.groupby('city')
    .agg({'count': 'sum', 'latitude': 'mean', 'longitude': 'mean'})
    .reset_index()
)

# Aggregate state job counts (already done)
state_grouped = complete_counts[['state', 'count']]

# Plot with Plotly
choropleth = go.Choropleth(
    geojson=germany_geojson,
    locations=state_grouped['state'],
    z=state_grouped['count'],
    featureidkey="properties.name",
    colorscale="Blues",
    colorbar_title="Job Count",
    hovertemplate="<b>%{location}</b><br>Jobs: %{z}<extra></extra>",
    name="States"
)

scatter = go.Scattergeo(
    lon=city_grouped['longitude'],
    lat=city_grouped['latitude'],
    text=city_grouped['city'] + "<br>Jobs: " + city_grouped['count'].astype(str),
    mode='markers',
    marker=dict(size=6, color='red', opacity=0.7),
    hoverinfo='text',
    name='Cities'
)

fig = go.Figure(data=[choropleth, scatter])
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(
    title="Data Science Job Postings in Germany by State & City",
    title_x=0.5,
    width=1000,
    height=800,
    legend=dict(x=0.01, y=0.99)
)

fig.show()

Data Science Jobs in Germany

🧾 Conclusion

Our analysis of data science job postings in Germany reveals a strong concentration in key regions and cities:

  • North Rhine-Westphalia is teh state with the highest number of data science job postings (39), followed by Bavaria (31), Lower Saxony (26), and Berlin (25).
  • At the city level, Berlin tops the list with 31 job postings, significantly ahead of other cities. Hannover (22), Munich (20), and Cologne (11) also show notable demand.
  • Interestingly, some smaller cities like Weeze (8) and Karlsruhe (7) appear in the top 10, indicating that opportunities are not limited to major hubs.

These patterns suggest that while traditional tech and business centers dominate, data science roles are emerging in a wider range of locations across Germany.

Next steps

In the next post we will analyze a larger dataset of data science job postings and focus on answering questions such as:

  • Which are the top roles by location?
  • Which are the top hiring companies?

Stay tuned :)