In the world of data science, every dataset has a story to tell. In this article, we’ll embark on a journey to extract, analyze, and visualize EPFL’s restaurant menus. We’ll cover data scraping, exploratory data analysis (EDA), network processing, and even automation to keep our dataset up-to-date.

Data Scraping: The project starts with web scraping techniques to extract menu information, including item names, prices, and vegetarian options, from various EPFL campus restaurant websites.
Exploratory Data Analysis (EDA): With the menu data in hand, we leverage Python’s Pandas and Seaborn libraries for exploratory data analysis. This step allows us to uncover insights into menu trends, price distributions, and the proportion of vegetarian meals over time.
Network Analysis: Going beyond traditional analysis, we delve into the text within meal names. By constructing a text network of meal names using some text processing and visualizing with D3.js, we identify common ingredients and connections between them.
Automation: To ensure that menu data remains up-to-date without manual intervention, we’ve implemented automation using GitHub Actions. A scheduled workflow runs the menu scraping script daily and rebuilding the network, guaranteeing access to the latest information.

Data Scraping

We kickstart our journey with Python, a versatile language that excels at web scraping. We’ll use Python libraries such as requests, BeautifulSoup, and pandas to fetch and structure our data.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
from tqdm import tqdm

Our first goal is to retrieve the restaurant menu data for EPFL. We create a function get_menu(date) to fetch the menu for a specific date:

def get_menu(date):
    '''
        Get the menu for a given date.
        Input:
            - date (string): string in the format YYYY-MM-DD
        Returns:
            - menus (DataFrame): DataFrame containing the menu for the given date
    '''

    # Get the page content
    url = 'https://www.epfl.ch/campus/restaurants-shops-hotels/fr/offre-du-jour-de-tous-les-points-de-restauration/?date=' + date
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
    r = requests.get(url, headers=headers)
    r.encoding = 'utf-8'
    soup = BeautifulSoup(r.text, 'html.parser')
    
    # Create an empty DataFrame
    menus = pd.DataFrame(columns=['name', 'restaurant', 'etudiant', 'doctorant', 'campus', 'visiteur', 'vegetarian'])
    menus['vegetarian'] = menus['vegetarian'].astype('bool')

    # Find the menu table
    menuTable = soup.find('table', id='menuTable')
    if not menuTable:
        return menus
    
    # Find all menu items
    lunches = menuTable.find_all('tr', class_='lunch')
    if not lunches:
        return menus

    for item in lunches:
        name = item.find('div', class_="descr").text.replace('\n', '\\n')
        restaurant = item.find('td', class_='restaurant').text
        
        # If item has class 'vegetarian', then it's vegetarian
        vegetarian = 'vegetarian' in item['class']

       # Extract all price options for this menu item
        price_elements = item.find_all('span', class_='price')
        prices_dict = {}
        
        # Check if there is at least one price element
        valid_menu = True
        if price_elements:
            for price_element in price_elements:
                price_text = price_element.text.strip()
                price_category = price_element.find('abbr', class_='text-primary')
                
                # Check if there is a price category (abbreviation)
                if price_category:
                    category = price_category.text.strip()
                else:
                    category = 'default'
                    
                # Check if the price format is valid (no characters after CHF)
                if not re.search(r'CHF\s*\S', price_text):
                    price_match = re.search(r'([\d.]+)\s*CHF', price_text, re.IGNORECASE)
                    if price_match:
                        price = float(price_match.group(1))
                        prices_dict[category] = price
                else:
                    # Discard the menu if this condition is met
                    valid_menu = False
                    break

        # If the menu is valid, add it to the DataFrame
        if valid_menu:
            # Fill missing prices with the default price, if it exists
            if 'default' in prices_dict:
                default_price = prices_dict['default']
                for category in ['E', 'D', 'C', 'V']:
                    if category not in prices_dict:
                        prices_dict[category] = default_price
            else:
                # Set missing prices to None
                for category in ['E', 'D', 'C', 'V']:
                    if category not in prices_dict:
                        prices_dict[category] = None

            menus = pd.concat([menus, pd.DataFrame({
                'name': [name],
                'restaurant': [restaurant],
                'etudiant': [prices_dict['E']],
                'doctorant': [prices_dict['D']],
                'campus': [prices_dict['C']],
                'visiteur': [prices_dict['V']],
                'vegetarian': [vegetarian]
            })])

    return menus

Fetching Menus for a Date Range

To build a comprehensive dataset, we create another function get_all_menus(start_date, end_date) to fetch menus for a range of dates:

def get_all_menus(start_date, end_date):
    '''
        Get the menus for a given date range.
        Input:
            - start_date (string): string in the format YYYY-MM-DD
            - end_date (string): string in the format YYYY-MM-DD
        Returns:
            - menus (DataFrame): DataFrame containing the menus for the given date range
    '''
    # Create an empty DataFrame
    menus = pd.DataFrame(columns=['date', 'name', 'restaurant', 'etudiant', 'doctorant', 'campus', 'visiteur', 'vegetarian'])
    menus['vegetarian'] = menus['vegetarian'].astype('bool')

    # Iterate over the date range
    for date in tqdm(pd.date_range(start_date, end_date)):
        day_menus = get_menu(date.strftime('%Y-%m-%d'))
        day_menus['date'] = date
        menus = pd.concat([menus, day_menus])
    return menus

I’ve ran this function for the date range 2022-01-03 to 2023-08-31, in order to have a baseline, and saved the results in a CSV file data/menus.csv.

Exploratory Data Analysis (EDA)

With our menu data in hand, it’s time for some EDA. We load the data and start by examining its structure:

import pandas as pd

menus = pd.read_csv('data/menus.csv')

We check for missing values and calculate summary statistics:

# Summary statistics
print("\nSummary Statistics:")
print(menus.describe())

           etudiant     doctorant        campus      visiteur
count  18038.000000  17642.000000  17542.000000  19849.000000
mean      10.423744     11.504200     11.897988     12.507411
std        3.815614      4.090794      3.968985      3.994251
min        0.000000      0.000000      0.000000      0.000000
25%        8.000000      9.000000      9.900000     10.000000
50%       10.000000     11.000000     12.000000     12.500000
75%       12.500000     13.500000     13.900000     14.900000
max       25.000000     25.000000     25.000000    122.000000

Key Insights from Summary Statistics

The mean price for a student meal is lower than that of a doctoral student, which is lower than that of a campus employee, which is lower than that of a visitor. This makes sense as students are on a tight budget, and visitors are willing to pay more for a meal.
The maximum price for a visitor meal is CHF 122, which is very high for a meal at an EPFL restaurant. This may be due to a typo by the restaurant, we’ll ignore this outlier from now on.
The minimum price for a meal is CHF 0 (free), which is suspicious for a meal at an EPFL restaurant. This may be due to a typo by the restaurant but it might be interesting to investigate this further.

# Count the number of missing values in each column
missing_values = menus.isnull().sum()
print("\nMissing Values:")
print(missing_values)

Missing Values:
date             0
name             1
restaurant       0
etudiant      3212
doctorant     3608
campus        3708
visiteur      1401
vegetarian       0
dtype: int64

Key Insights from Summary Statistics

We can see that there are missing values in the etudiant, doctorant, campus, and visiteur columns but that’s due to the fact that not all meals are offered at all price points.
There is one missing value in the name column, which is weird but unsignificant.
There are no missing values in the date, restaurant and vegetarian columns, which is good.
The number of missing values in the visiteur column is relatively low compared to the other columns. This is due to the fact that most meals have only one price point which is the visiteur price.

Visualizing Vegetarian Proportions

One interesting aspect of the data is the proportion of vegetarian meals over time. To visualize this trend, we calculate the proportion of vegetarian meals for each date and plot the trendline:

# Calculate the proportion of vegetarian meals for each date
menus_datetime = menus.copy()
menus_datetime['date'] = pd.to_datetime(menus_datetime['date'], format='%Y-%m-%d')
vegetarian_proportions = menus_datetime.groupby('date')['vegetarian'].mean().reset_index()

# Convert date to numeric values
vegetarian_proportions['date_numeric'] = pd.to_numeric(vegetarian_proportions['date'])

# Fit a polynomial regression model
coefficients = np.polyfit(vegetarian_proportions['date_numeric'], vegetarian_proportions['vegetarian'], 2)
poly = np.poly1d(coefficients)

# Create a regplot to visualize the trendline
plt.figure(figsize=(20, 6))
sns.scatterplot(data=vegetarian_proportions, x='date_numeric', y='vegetarian', s=50, label='Data')

# Create a smooth trendline using the polynomial model
x_values = np.linspace(vegetarian_proportions['date_numeric'].min(), vegetarian_proportions['date_numeric'].max(), 100)
y_values = poly(x_values)
plt.plot(x_values, y_values, color='red', label='Trendline')

plt.title("Trendline of Proportion of Vegetarian Meals Over Time")
plt.xlabel("Date")
plt.ylabel("Proportion")
plt.xticks(rotation=45)
plt.grid(True)

# X-ticks
ax = plt.gca()
ax.xaxis.set_major_locator(ticker.MaxNLocator(integer=True, prune='both'))

# Select approximately 10 evenly spaced date indices
date_indices = np.linspace(0, len(vegetarian_proportions) - 1, num=10, dtype=int)
selected_dates = vegetarian_proportions['date'].iloc[date_indices].dt.strftime('%Y-%m-%d')

# Set x-ticks and corresponding labels
ax.set_xticks(vegetarian_proportions['date_numeric'].iloc[date_indices])
ax.set_xticklabels(selected_dates, rotation=45)

# Adjust layout and display the plot
plt.tight_layout()
plt.show()

Key Insights from Vegetarian Proportions

On average, the proportion of vegetarian meals appears to be stable over time (a regression line around 40%), with a slight downward trend in recent months.
There are a few outliers with a proportion of 0% or 100% vegetarian meals that may be due to no/few meals being offered on that day. It may be worth investigating these outliers further.
Some days have a higher proportion of vegetarian meals than others. This may be due to the vegetarian days policy at EPFL, where some days are vegetarian-only. It may be worth investigating this further to confirm this hypothesis.

Visualizing Price Distributions

We visualize the distribution of meal prices for students (etudiants) using histograms and boxplots:

# Create a figure with subplots for the histogram and boxplot
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10), sharex=False)

# Calculate the statistics for etudiant meal prices
mean_etudiant_price = menus['etudiant'].mean()
median_etudiant_price = menus['etudiant'].median()
q25_etudiant_price = menus['etudiant'].quantile(0.25)
q75_etudiant_price = menus['etudiant'].quantile(0.75)

# Histogram for the price distribution of etudiant meals
sns.histplot(data=menus, x='etudiant', bins=20, kde=True, ax=ax1)
ax1.set_title("Price Distribution of Etudiant Meals")
ax1.set_ylabel("Count")
ax1.set_xlabel("Price (CHF)")
ax1.axvline(mean_etudiant_price, color='r', linestyle='--', label=f"Mean: {mean_etudiant_price:.2f}")
ax1.grid(True)

# Boxplot for the price distribution of etudiant meals
sns.boxplot(data=menus, x='etudiant', ax=ax2, orient='h')
ax2.set_xlabel("Price (CHF)")
ax2.axvline(mean_etudiant_price, color='r', linestyle='--', label=f"Mean: {mean_etudiant_price:.2f}")
statistics_text = f"Mean: {mean_etudiant_price:.2f}\nMedian: {median_etudiant_price:.2f}\nQ25: {q25_etudiant_price:.2f}\nQ75: {q75_etudiant_price:.2f}"
ax2.text(0.8, 0.8, statistics_text, transform=ax2.transAxes, bbox=dict(facecolor='white', alpha=0.5))
ax2.grid(True)

# Adjust layout and display the plot
plt.tight_layout()
plt.show()

Key Insights from Price Distributions

The majority of student meal prices fall within a certain range, with a few outliers at higher prices.
The mean student meal price is approximately CHF 10.42, with a median price of CHF 10 which is high for a student budget.

Network Analysis

Now, let’s delve into network processing. We’ll clean the menu item names, remove stopwords, create a vocabulary, and identify bigrams (pairs of consecutive words) to represent relationships. We’ll then structure the data for network graph creation. Finally, we’ll visualize the network graph using D3.js.

Network Building
Network Visualization with D3.js

Network Building

Data Preprocessing

We start by loading the dataset using Pandas:

import pandas as pd
import os

# Load the menu data from CSV
file_path = os.path.dirname(os.path.abspath(__file__))
menus = pd.read_csv(os.path.join(file_path, '..', 'data', 'menus.csv'), index_col=0)

One crucial preprocessing step is to clean the menu item names. We remove special characters, digits, and extra spaces. Additionally, we eliminate common stopwords such as “de,” “aux,” and “et” to focus on meaningful terms:

import re

def extract_tokens(text):
    if pd.isnull(text):
        return []

    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text)
    text = re.sub(r'\d+', '', text)
    text = re.sub(r'\s+', ' ', text).strip()

    stopwords = [' ', 'de', 'aux', 'au', 'à', 'la', 'le', 'et', 'sur', 'du', 'chef', 'ou', 'notre', 'by', 'en']
    tokens = [token for token in text.split() if token not in stopwords]
    return tokens

tokens = menus['name'].apply(extract_tokens)

Building a Vocabulary

To create a network graph, we need to build a vocabulary from these cleaned tokens. We count the frequency of each term in the entire dataset:

    def create_vocabulary(tokens):
    vocabulary = {}
    for token_list in tokens:
        for token in token_list:
            if token in vocabulary:
                vocabulary[token] += 1
            else:
                vocabulary[token] = 1
    return vocabulary

vocabulary = create_vocabulary(tokens)

Identifying Bigrams

Now that we have our vocabulary, we can move on to creating the network graph. We’ll filter the vocabulary to include only terms that occur frequently enough (e.g., at least 100 times) to reduce noise in our visualization:

filtered_vocabulary = {k: v for k, v in vocabulary.items() if v >= min_frequency}

Next, we’ll generate bigrams (pairs of consecutive words) to represent relationships between terms. We calculate their frequencies and filter them based on our filtered vocabulary:

finder = nltk.BigramCollocationFinder.from_documents(tokens)
bigram_measures = nltk.collocations.BigramAssocMeasures()
bigrams = list(finder.score_ngrams(bigram_measures.raw_freq))

# Filter bigrams using the filtered vocabulary
filtered_bigrams = []
for bigram in bigrams:
    if (bigram[0][0] in filtered_vocabulary.keys() and bigram[0][1] in filtered_vocabulary.keys()):
        new_bigram = bigram[0]
        filtered_bigrams.append((new_bigram, bigram[1]))

Graph Creation

With the filtered vocabulary and bigrams in place, we can structure the data for our network graph. We define vertices (nodes) and edges (connections) based on this data:

vertices = []
sizes = list(filtered_vocabulary.values())
for i, term in enumerate(filtered_vocabulary.keys()):
    vertices.append({
        'id': term,
        'label': term,
        'size': sizes[i]
    })

edges = []
for i, edge in enumerate(filtered_bigrams):
    (source, target) = edge[0]
    edges.append({
        'id': i,
        'source': source,
        'target': target,
        'size': edge[1]
    })

Network Visualization with D3.js

To visualize our network graph interactively, we turn to D3.js. We create an HTML page that integrates the D3.js script:

<!DOCTYPE html>
<html>
<head>
    <title>EPFL Restaurant Menu Network</title>

    <!-- JQuery -->
    <script src="https://code.jquery.com/jquery-3.4.1.min.js"></script>

    <!-- D3.js -->
    <script src="https://d3js.org/d3.v4.min.js"></script>
</head>
<body>
    <svg id="mynetwork"></svg>
    <div id="bigramsCard">
        <h3 class="title">Top Bigrams</h3>
        <div id="topbigrams"></div>
    </div>
</body>
<!-- Custom Script -->
<script type="module" src="./network/network.js"></script>
</html>

We add some CSS styling to the page:

html,
body {
    min-height: 100%;
    height: 100%;
    min-width: 100%;
    margin: 0;
    padding: 0;
    font-family: sans-serif;
}

#mynetwork {
    position: absolute;
    top: 0;
    left: max(20%, 200px);
    right: 0;
    bottom: 0;
    width: min(80%, calc(100% - 200px));
    height: 100%;
    background-color: white;
}

#bigramsCard {
    position: absolute;
    top: 0;
    left: 0;
    bottom: 0;
    width: max(20%, 200px);

    background-color: ghostwhite;
    border-radius: 5px;
    border: 1px solid lightgray;

    z-index: 100;
    padding: 0;
    margin: 0;
    overflow-y: auto;
}

#topbigrams {
    padding: 15px;
}

.title {
    margin: 0;
    margin-top: 20px;
    font-size: 1.5em;
    font-weight: 800;
    text-align: center;
}

@media screen and (max-width: 800px) {
    #mynetwork {
        left: 0;
        width: 100%;
        top: max(20%, 100px);
        height: min(80%, calc(100% - 100px));
    }

    #bigramsCard {
        top: 0;
        left: 0;
        right: 0;
        width: 100%;
        height: max(20%, 100px);
    }
}

Finally, we need to create a D3.js script network.js to visualize the network graph.

Data Loading

We first start our script by loading data from JSON files containing network graph information.

fetch(`./network/edges.json`)
    .then(response => {
        if (response.status == 404) throw error;
        return response.json();
    })
    .then(links => {
        fetch(`./network/vertices.json`)
            .then(response => {
                if (response.status == 404) throw error;
                return response.json();
            })
            .then(nodes => {
                // core script //
            })
            .catch(err => {
                console.error(err)
                console.error("No data for vertices")
            })
    })
    .catch(err => {
        console.error(err)
        console.error("No data for edges")
    })

Here we fetch two JSON files: edges.json and vertices.json, which contain information about the links (edges) and nodes of the graph, respectively. If the data cannot be loaded, error messages are logged.

Graph Visualization Setup

We set up the initial parameters and elements for the network graph visualization such as title, width, and height. title represents the title of the graph, and width and height are calculated based on the dimensions of the mynetwork container.

// Define variables for graph visualization
const title = 'Menus ingredients'
const width = $('#mynetwork').innerWidth()
const height = $('#mynetwork').innerHeight()

We then set up initial zoom settings, create a zoom handler function (zoom_handler), and create an SVG element (svg) inside the mynetwork container. The initial zoom level and zoom capabilities are defined here.

// Initial zoom settings
var initial_zoom = d3.zoomIdentity.translate(600, 400).scale(0.001);

// Add zoom capabilities
var zoom_handler = d3.zoom().on("zoom", zoom_actions);

// Create an SVG element
const svg = d3.select('#mynetwork')
    .attr('width', width)
    .attr('height', height)
    .call(zoom_handler)
    .call(zoom_handler.transform, initial_zoom)

Color Scale and Node Radius

We define the color scale and node radius for graph visualization. The color scale maps node sizes to colors, and the node radius defines the radius for nodes in the graph. To do so, we first calculate the maximum node size in the graph and define a color scale based on this value.

// Determine the maximum size value for node coloring
var max_value = 0
for (node of nodes) {
    if (node.size > max_value) max_value = node.size;
}

// Define a color scale
var color = d3.scaleLinear()
    .domain([1, max_value])
    .range(["yellow", "red"])

// Define the radius for nodes
const radius = 10

Force Simulation

We then set up the force simulation for node positioning within the graph.

// Create a simulation for node positioning
var simulation = d3.forceSimulation()
    .force("link", d3.forceLink().id(function (d) { return d.id; }))
    .force("center", d3.forceCenter(width / 2, height / 2))
    .force("collide", d3.forceCollide().radius(d => { return d.size * 2 * radius }).iterations(3))
    .on("tick", ticked);

Here, the force simulation is defined with forces for links, centering, and collision detection. The ticked function is called to handle node positions during simulation ticks.

SVG Elements and Drag Handler

We create SVG elements for links and nodes and defines a drag handler for nodes.

// Create SVG groups for links and nodes
var zoomable = svg.append("g").attr("class", "zoomable").attr('transform', initial_zoom),
    link = zoomable.append("g").attr('class', 'links').selectAll(".link"),
    node = zoomable.append("g").attr('class', 'nodes').selectAll(".node")

// Create a drag handler
var drag_handler = d3.drag()
    .on("start", dragstarted)
    .on("drag", dragged)
    .on("end", dragended);

SVG groups for links and nodes are created, and the drag handler is defined for node interactions.

Graph Initialization

We then initializes the graph with nodes and links.

restart()

The restart function is called to set up nodes, links, and their attributes based on the loaded data.

Title

We add a title to the graph for context.

// Add a title to the graph
svg.append('g')
    .append('text')
    .attr('class', 'title')
    .attr('x', width / 2)
    .attr('y', 50)
    .attr('text-anchor', 'middle')
    .text(title);

Tick Function and Drag Functions

Finally we need to define functions for handling simulation ticks and node dragging.

// Function for updating node and link positions during simulation ticks
function ticked() {
    // Update link positions
    link
        .attr("x1", function (d) { return d.source.x; })
        .attr("y1", function (d) { return d.source.y; })
        .attr("x2", function (d) { return d.target.x; })
        .attr("y2", function (d) { return d.target.y; });

    // Update node positions
    node
        .attr("transform", function (d) {
            return "translate(" + d.x + "," + d.y + ")";
        })
}

The ticked function updates the positions of links and nodes during each simulation tick.

function dragstarted(d) {
    // Start node dragging
    if (!d3.event.active) simulation.alphaTarget(0.3).restart();
    d.fx = d.x;
    d.fy = d.y;
}

function dragged(d) {
    // Continue node dragging
    d.fx = d3.event.x;
    d.fy = d3.event.y;
}

function dragended(d) {
    // End node dragging
    if (!d3.event.active) simulation.alphaTarget(0);
    d.fx = null;
    d.fy = null;
}

These functions handle node dragging interactions within the graph.

Top Bigrams

In addition to the network graph, we also want to display the top bigrams in the dataset. To do so we first fetch the bigrams.json file, then sort the bigrams by frequency and finally display the top 50 in the #topbigrams element of index.html in order to provide user with insights into the most common two-word combinations (bigrams) found in the menu item names.

fetch(`./network/bigrams.json`)
    .then(response => {
        if (response.status == 404) throw error;
        return response.json();
    })
    .then(bigrams => {
        // List top 50 bigrams in #topbigrams
        var topbigrams = bigrams.sort((a, b) => b[1] - a[1]).slice(0, 50)
        $('#topbigrams').empty()
        $('#topbigrams').append('<ul>')
        for (let bigram of topbigrams) {

            // bigram = [['word1', 'word2'], frequency]
            $('#topbigrams').append(`<li>${bigram[0][0]}, ${bigram[0][1]} (${bigram[1].toFixed(0)})</li>`)
        }
        $('#topbigrams').append('</ul>')
    })
    .catch(err => {
        console.error(err)
        console.error("No data for bigrams")
    })

Automation with Cron Jobs

Automation is key to maintaining an up-to-date dataset. We create a Python script cron.py that runs daily and updates the menu data:

from utils import get_menu
import pandas as pd
import datetime

# Get today's date
today = datetime.date.today().strftime('%Y-%m-%d')

# Get the menu for today
today_menus = get_menu(today)

# Read the existing menus
menus = pd.read_csv('data/menus.csv')

# If date not in menus, add it
if today not in menus['date'].values:
    today_menus['date'] = today
    menus = pd.concat([menus, today_menus])
    menus.to_csv('data/menus.csv', index=False)

Scheduled Workflow with GitHub Actions

To ensure that our dataset remains up-to-date, we can automate the data update process using GitHub Actions. If you want to learn more I made a dedicated post about Automating Data Scraping with GitHub Actions.

We create a scheduled workflow .github/workflows/cron_scrap.yml that runs the menu scraping script daily:

name: cron_scrap

on:
  schedule:
    - cron: "0 0 * * *"
  
jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: checkout repo content
        uses: actions/checkout@v3 # checkout the repository content to github runner.
        
      - name: setup python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
          
      - name: install python packages
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
                    
      - name: execute py script # run main.py
        run: python cron.py

      - name: build network # run network.py
        run: python ./network/network.py
          
      - name: commit files
        run: |
          git config --local user.email "action@github.com"
          git config --local user.name "GitHub Action"
          git add -A
          git diff-index --quiet HEAD || (git commit -a -m "updated logs" --allow-empty)
                    
      - name: push changes
        uses: ad-m/github-push-action@v0.6.0
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          branch: main

Conclusion

In this journey, we’ve gone from data scraping to network visualization. We’ve explored EPFL restaurant menus, conducted EDA, processed data for network graphs, and automated the data update process.

Data science and visualization projects like this demonstrate the power of Python, D3.js, and automation in uncovering insights and maintaining up-to-date datasets. The possibilities are endless when you combine data, code, and creativity.

Now, it’s your turn to explore your datasets, create captivating visualizations, and automate the data update process. Happy data journeying!

Analyzing EPFL Restaurant Menus

From Data Scraping to EDA and Network Visualization

Data Scraping

Getting the Menu Data for a given Date

Fetching Menus for a Date Range

Exploratory Data Analysis (EDA)

Key Insights from Summary Statistics

Key Insights from Summary Statistics

Visualizing Vegetarian Proportions

Key Insights from Vegetarian Proportions

Visualizing Price Distributions

Key Insights from Price Distributions

Network Analysis

Network Building

Data Preprocessing

Building a Vocabulary

Identifying Bigrams

Graph Creation

Network Visualization with D3.js

Data Loading

Graph Visualization Setup

Color Scale and Node Radius

Force Simulation

SVG Elements and Drag Handler

Graph Initialization

Title

Tick Function and Drag Functions

Top Bigrams

Automation with Cron Jobs

Scheduled Workflow with GitHub Actions

Conclusion