{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "t9lgYi7EPK2m" }, "source": [ "# 3 - Item Response Theory with Stan\n", "\n", "[](https://github.com/annabavaresco/ancm2024/blob/main/docs/week_3/3_IRT_Stan.ipynb)\n", "[](https://colab.research.google.com/github/annabavaresco/ancm2024/blob/main/docs/week_3/3_IRT_Stan.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "z_sHGGTNPQcx" }, "source": [ "In this lab, you will explore item response theory and Bayesian modelling with the Stan language." ] }, { "cell_type": "markdown", "metadata": { "id": "QgZsGI6AOvoO" }, "source": [ "## Setup" ] }, { "cell_type": "markdown", "metadata": { "id": "TucqWWFKNWed" }, "source": [ "First, you need to install Stan. This may take several minutes :))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HyCe6fys4bBM", "outputId": "57dd88a3-b216-457a-cc52-57cae76070d6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CmdStan install directory: /root/.cmdstan\n", "Installing CmdStan version: 2.35.0\n", "Downloading CmdStan version 2.35.0\n", "Download successful, file: /tmp/tmpcn44248l\n", "Extracting distribution\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:cmdstanpy:cmd: make build -j1\n", "cwd: None\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Unpacked download as cmdstan-2.35.0\n", "Building version cmdstan-2.35.0, may take several minutes, depending on your system.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "DEBUG:cmdstanpy:cmd: make examples/bernoulli/bernoulli\n", "cwd: None\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Installed cmdstan-2.35.0\n", "Test model compilation\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "# Colab setup (courtesy of Justin Bois)\n", "# N.B. This cell may take several minutes to complete (3 mins on the instructor's machine)\n", "import os, sys, subprocess\n", "cmd = \"pip install --upgrade iqplot bebi103 arviz cmdstanpy watermark\"\n", "process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", "stdout, stderr = process.communicate()\n", "import cmdstanpy; cmdstanpy.install_cmdstan()" ] }, { "cell_type": "markdown", "metadata": { "id": "HcDGuqKbJXDC" }, "source": [ "Next, you need to download the data and Stan template [here](https://drive.google.com/file/d/1BBeL2BtfTIBqMlJFTC_OZTUdT_pt9mpR/view?usp=share_link). Save it to your own Google Drive as in previous labs, and then mount your drive." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XLNJTp_w4600", "outputId": "05b9bb76-98a0-4ccf-c30a-64dadade2a6a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mounted at /content/drive\n" ] } ], "source": [ "from google.colab import drive\n", "drive.mount('/content/drive')" ] }, { "cell_type": "markdown", "metadata": { "id": "Z2ObpzjxClnA" }, "source": [ "Unzip the files into a folder (you will be able to find this folder if you click the folder icon in your left sidebar):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "K3zIlfAh-t9o" }, "outputs": [], "source": [ "!unzip -qq '/content/drive/MyDrive/irt4ancm.zip'" ] }, { "cell_type": "markdown", "metadata": { "id": "J2wCXdewOSUh" }, "source": [ "The following cell prints a list of all of the segments used in the experiment, so that you can find and listen to the results. All of the audio was extracted from the official YouTube videos of the Eurovision Song Contest finals." ] }, { "cell_type": "markdown", "metadata": { "id": "xeY9z78vO0Ct" }, "source": [ "## Background" ] }, { "cell_type": "markdown", "metadata": { "id": "VMgzZvvTO3Bm" }, "source": [ "The data in this lab come from the Eurovision Song Contest edition of the Hooked on Music experiment. You can try the experiment [here](https://app.amsterdammusiclab.nl/eurovision_2021). In this experiment, people were presented with segments from Eurovision songs and were asked if they'd ever heard the song. If their answer was 'yes', the song was muted for a few seconds and then went back on. In some trials, the song resumed at the right point. In others, it resumed a bit earlier or later. Participants were then alsp asked whether the second segment was the 'right' continuation for the first or not. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 576 }, "id": "ptYoGCMl7EI3", "outputId": "db6af206-e3a4-4d36-8e25-cd43b6408ead" }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "summary": "{\n \"name\": \"segment_df\",\n \"rows\": 437,\n \"fields\": [\n {\n \"column\": \"segment\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 126,\n \"min\": 1,\n \"max\": 437,\n \"num_unique_values\": 437,\n \"samples\": [\n 396,\n 79,\n 279\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"song\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 23,\n \"min\": 1,\n \"max\": 77,\n \"num_unique_values\": 77,\n \"samples\": [\n 5,\n 36,\n 11\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"country\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 40,\n \"samples\": [\n \"Portugal\",\n \"Serbia\",\n \"Azerbaijan\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"year\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 2016,\n \"max\": 2019,\n \"num_unique_values\": 4,\n \"samples\": [\n 2017,\n 2019,\n 2016\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"artist\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 76,\n \"samples\": [\n \"Frans\",\n \"Naviband\",\n \"Douwe Bob\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"title\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 77,\n \"samples\": [\n \"If I Were Sorry\",\n \"Historyja Majho Zyccia\",\n \"Slow Down\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"start_position\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 49.279014339093685,\n \"min\": 0.0,\n \"max\": 166.12,\n \"num_unique_values\": 403,\n \"samples\": [\n 27.098,\n 67.223,\n 147.616\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"segment_type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"i\",\n \"v\",\n \"o\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}", "type": "dataframe", "variable_name": "segment_df" }, "text/html": [ "\n", "
\n", " | song | \n", "country | \n", "year | \n", "artist | \n", "title | \n", "start_position | \n", "segment_type | \n", "
---|---|---|---|---|---|---|---|
segment | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1 | \n", "1 | \n", "Ukraine | \n", "2016 | \n", "Jamala | \n", "1944 | \n", "0.000 | \n", "i | \n", "
2 | \n", "1 | \n", "Ukraine | \n", "2016 | \n", "Jamala | \n", "1944 | \n", "7.925 | \n", "v | \n", "
3 | \n", "1 | \n", "Ukraine | \n", "2016 | \n", "Jamala | \n", "1944 | \n", "39.500 | \n", "c | \n", "
4 | \n", "1 | \n", "Ukraine | \n", "2016 | \n", "Jamala | \n", "1944 | \n", "72.043 | \n", "v | \n", "
5 | \n", "1 | \n", "Ukraine | \n", "2016 | \n", "Jamala | \n", "1944 | \n", "132.559 | \n", "b | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
433 | \n", "69 | \n", "Czechia | \n", "2019 | \n", "Lake Malawi | \n", "Friend of a Friend | \n", "78.128 | \n", "v | \n", "
434 | \n", "70 | \n", "Denmark | \n", "2019 | \n", "Leonora | \n", "Love Is Forever | \n", "61.508 | \n", "v | \n", "
435 | \n", "71 | \n", "Cyprus | \n", "2019 | \n", "Tamta | \n", "Replay | \n", "66.212 | \n", "v | \n", "
436 | \n", "73 | \n", "Slovenia | \n", "2019 | \n", "Zala Kralj & Gašper Šantl | \n", "Sebi | \n", "70.698 | \n", "v | \n", "
437 | \n", "75 | \n", "Serbia | \n", "2019 | \n", "Nevena Božović | \n", "Kruna | \n", "106.544 | \n", "v | \n", "
437 rows × 7 columns
\n", "\n", " | Mean | \n", "MCSE | \n", "StdDev | \n", "5% | \n", "50% | \n", "95% | \n", "N_Eff | \n", "N_Eff/s | \n", "R_hat | \n", "
---|---|---|---|---|---|---|---|---|---|
lp__ | \n", "-5525.360000 | \n", "0.895102 | \n", "26.427900 | \n", "-5568.770000 | \n", "-5525.080000 | \n", "-5482.130000 | \n", "871.729 | \n", "7.37628 | \n", "1.003680 | \n", "
mu_delta | \n", "1.259720 | \n", "0.008797 | \n", "0.096447 | \n", "1.102020 | \n", "1.260350 | \n", "1.420080 | \n", "120.208 | \n", "1.01716 | \n", "1.033090 | \n", "
sigma_theta | \n", "1.673200 | \n", "0.002529 | \n", "0.074205 | \n", "1.552020 | \n", "1.670900 | \n", "1.798820 | \n", "861.049 | \n", "7.28591 | \n", "1.003320 | \n", "
sigma_delta | \n", "0.813807 | \n", "0.001415 | \n", "0.044432 | \n", "0.743547 | \n", "0.812273 | \n", "0.889880 | \n", "986.007 | \n", "8.34326 | \n", "1.001050 | \n", "
theta[1] | \n", "-1.869670 | \n", "0.013852 | \n", "0.785863 | \n", "-3.246340 | \n", "-1.820170 | \n", "-0.692069 | \n", "3218.810 | \n", "27.23650 | \n", "0.999585 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
delta[433] | \n", "0.739824 | \n", "0.007818 | \n", "0.367495 | \n", "0.144554 | \n", "0.739232 | \n", "1.343020 | \n", "2209.490 | \n", "18.69600 | \n", "1.001850 | \n", "
delta[434] | \n", "1.223600 | \n", "0.008058 | \n", "0.379633 | \n", "0.606406 | \n", "1.212650 | \n", "1.860400 | \n", "2219.480 | \n", "18.78050 | \n", "1.000750 | \n", "
delta[435] | \n", "1.011530 | \n", "0.008252 | \n", "0.384406 | \n", "0.382054 | \n", "1.006030 | \n", "1.650400 | \n", "2169.800 | \n", "18.36010 | \n", "1.003760 | \n", "
delta[436] | \n", "1.620390 | \n", "0.008185 | \n", "0.380285 | \n", "1.005200 | \n", "1.610360 | \n", "2.249280 | \n", "2158.780 | \n", "18.26690 | \n", "1.003100 | \n", "
delta[437] | \n", "2.033020 | \n", "0.009591 | \n", "0.422329 | \n", "1.374380 | \n", "2.026730 | \n", "2.750340 | \n", "1938.970 | \n", "16.40690 | \n", "1.001530 | \n", "
936 rows × 9 columns
\n", "