# Probabilistic Models of Cognition

# Table of Contents

- Probabilistic Models of Cognition - Notebook
- Short Review
- WebPPL
- Exercise Solutions
- Supplementary Readings

# Probabilistic Models of Cognition - Notebook

## Short Review

Probabilistic Models of Cognition is an open-source web textbook which teaches an introduction to the probabilistic approach to cognitive science. A major component of this book is introducing *probabilistic programming languages*, which are programming languages with probabilistic/random primitives. Probmods (in its current form) does this in WebPPL, which is basically "probabilistic functional javascript", but some of the papers it references instead use Church, which is a probabilistic dialect of LISP. Both of these are moderately horrible to write programs in, but I do think they accomplish something pretty neat – they allow you to easily make inferences about a situation, which in turn allows you to make inferences *about other inferences*.

The core thesis of this book is as follows: Theory Theory posits that humans learn by generating some hypotheses about how things work, and then updating which theories to use based on evidence accumulated over time. You can model these theories using generative models, even for infinite hypothesis spaces (i.e. the set of all arithmetic expressions); and you can relatively precisely model this "updating" with bayesian learning (i.e. inference in webPPL), meaning you can get pretty powerful models of human behavior using this framework. The entire rest of the book, after introducing these concepts, extends this to different areas of human cognition such as intuitive physics, reasoning, social cognition, natural language process, developmental psychology, etc.

This book is definitively unfinished, but for what it is right now I very much enjoyed working through it! The exercises were mostly fun and engaging, the readings were interesting and only became more interesting as time went on, and I was able to use webPPL to do cool cogsci-type things relatively quickly after I started reading it. Debugging things in WebPPL is sort of a pain (literally nobody uses it), but I do think it's a pretty interesting language and it's used in a really clear way in this book so I was willing to put up with it.

This book is structured in what I consider to be somewhat weird order, with some of the exercises requiring the reader to sort of "invent" things that are presented to them more cleanly in later chapters. For example: the Conditioning reading introduces the idea of graph surgery / intervention, but this isn't demonstrated in webPPL until the dependence chapter, despite being required during the exercises for the chapter before it. I feel I need to take some of this with a some number of grains of salt, though, since the book is clearly a draft - there are a number of typos and several of the links are broken (e.g. a lot of the readings pages), so I feel that judging an unfinished book for it's weird ordering is perhaps not the most fair criticism in the world. Likewise, I felt mildly confused at points because I'm fairly certain this book is specifically written for (I imagine) Noah Goodman's class on this topic. This by itself is fine, but sometimes the exercises say things like "use the function we defined in class" which leaves me scratching my head at times. This is usually pretty workable, but I bet it's even better if you have the direct instruction that feels like it accompanies this.

Overall I think probmods is a good resource; I think it is likely pretty difficult for a student who is coming from a less technical corner of cogsci, but I think all the topics are presented from a fairly unassuming starting point. I think I would consider this a graduate-level text, but more due to breadth reasons rather than depth ones – most technical students can probably get the hang of webPPL very easily, and most social-science-oriented students probably know a good amount about what is actually being modeled and why.

I would definitely recommend it! A very fun read.

## WebPPL

WebPPL is horrible, and it frustrates me how useful it is considering writing programs in it was remarkably unpleasant the entire time. I joked with a colleague about how my choice between church and webPPL was "essentially, probabilistic lisp vs functional javascript" and the look of abject horror on his face roughly describes how I felt working through this textbook.

I think webPPL is probably better labeled an alpha release of a language, and unless church is even worse I feel switching the textbook to it was likely premature (although working on exercises in-browser was admittedly very nice). The error messages are rarely (if ever) helpful, and I lost a decent amount of time on things that I feel shouldn't have taken me as long to debug as they should have.

Here's a quick example: I had to write a function for the Mixture Models exercises which included a function where I return a dictionary representing a prototype with a three probabilities associated with three separate features. My first attempt tried to draw three values from a 3d dirichlet (which is a multivariate extension of the beta distribution).

var sampleGroupPrototype = mem(function(groupName) { var attributes = ['antennae', 'green', 'blarghNoise'] var alpha = ones([3, 1]) var distros = dirichlet({alpha: alpha}).data return _.zipObject(attributes, distros) })

and I later observe upon these values:

var results = Infer({method: 'MCMC', kernel: {HMC: {steps: 10, stepSize: .01}}, samples: 3000}, function(){ mapData({data: data}, function(datum) { var alienName = flip() ? 'group1' : 'group2' var prototype = sampleGroupPrototype(alienName) mapData({data: properties}, function(property) { observe(Bernoulli({p: prototype[property]}), datum[property]) }) }) return {group1: sampleGroupPrototype('group1'), group2: sampleGroupPrototype('group2')} })

However, if you run this code, everything explodes, saying that p should be of type real[0,1] (which it is). When you comment out the observation, you get a more helpful error which says "HMC does not yet support the Dirichlet distribution". So, instead, I made it just return three beta distributions instead of a single dirichlet distribution, which is apparently super fair game.

var sampleGroupPrototype = mem(function(groupName) { var attributes = ['antennae', 'green', 'blarghNoise'] var distros = repeat(3, function(){ return beta({a: 1, b: 1})}) return _.zipObject(attributes, distros) })

Working in this language is a giant pain because errors / confusing errors like this are very common, and there are no stackoverflow questions on these sorts of topics since virtually nobody uses this. I had some trouble! Perhaps this is my unreasonableness here but it made me appreciate all the time I spend in other languages a bit more.

That said, you can do some useful stuff in very few lines of code using webPPL, so don't let it put you off the book completely. Just be warned, it's not always sunshine and rainbows working in this language.

## Exercise Solutions

### Generative Models

#### 1

b) (replace function w/ whatever)

var a1 = function(){return flip() ? flip(.7) : flip(.1);}; var a = repeat(1000, a1); viz.hist(a);

c)

var d1 = function(){return flip(.8) ? flip() : false;}; var d = repeat(1000, d1); viz.hist(d);

#### 2

a)

foo in the first one is the result of a flip, not a function; the second one flips every time you call it

b)

var foo = mem(function() {return flip()}); display([foo(), foo(), foo()]);

c) super overcomplicated this lol

var foo = mem(function(x) {return flip()}); display([foo(0), foo(0), foo(1)]);

#### 3

a) the actual calculations come out to .7h .3t for program A and something like 0.729h 0.271t for program B, so a perfect 7:3 split sounds more lika program A

b) program B couldve also produced that, sure, it's within the range of random variance (since it's also approximately 7:3) but it's just less likely.

#### 4

a) P(sneezing) = .14 + .24 + .06 = 0.44; P(sneezing)P(fever) = 0.2

b)

Infer({method: "forward", samples: 1000}, function() { var allergies = flip(0.3); var cold = flip(0.2); var sneeze = cold || allergies; var fever = cold return [sneeze && fever]; })

c) the mems are the key for this one

var allergies = mem(function(person) {return flip(.3)}); var cold = mem(function(person) {return flip(.2)}); var sneeze = function(person) {return cold(person) || allergies(person)} var fever = function(person) {return cold(person)} display([sneeze('bob'), sneeze('alice'), fever('bob'), fever('alice')])

#### 5

a) 0.4h 0.6t b)

var makeCoin = function(weight) { return function() { return flip(weight) ? 'h' : 't' } } var bend = function(coin) { return function() { return coin() == 'h' ? makeCoin(.7)() : makeCoin(.1)() } } var fairCoin = makeCoin(.5) var bentCoin = bend(fairCoin) Infer({method: "forward", samples: 1000}, bentCoin);

#### 6

a) it seems like it would just be (1-p)^{5}*p
b)

var geometric = function(p) { return flip(p) ? 0 : 1 + geometric(p) }; Infer({method: "forward", samples: 1000}, geometric);

#### 7

a) / b)

var c = function() { var a = flip(.8); var b = function(a) { flip(a ? 0.5 : 0.3); }; return [a, b(a)]; } Infer({method: "forward", samples: 1000}, c);

#### 8

a) flipsequence changes the probability of h and t from p and 1-p (respectively) to some function of p^{2} and (1-p)^{2} which i am too lazy to figure out exactly; since if the coin is slanted in one direction more than the other it gets even more unlikely to yield the unlikely roll twice in a row, compared to getting the most likely outcome twice in a row (repeating until one of the two happens)
b) fairCoin just yields 50/50 since there is no slant to exacerbate

#### 9

var ground = {shape: 'rect', static: true, dims: [worldWidth, 10], x: worldWidth/2, y: worldHeight} var rect = {shape: 'rect', static: false, dims: [10, 100], x: worldWidth/2, y: 390} var orb = {shape: 'circle', static: false, dims: [10, 10], x: 0, y: 300, velocity: [1000, 0]} var bowlingWorld = [ground, rect, orb] physics.animate(1000, bowlingWorld);

### Conditioning

#### 1

a)

*notes: dunno what the 'H' was for, changed it*

var model = function() { var coin = flip() return coin } var log_prob = Infer({method:'enumerate'}, model).score(true) Math.exp(log_prob)

b)

var model = function() { var truecoin = flip() var A = flip(truecoin ? .5 : .9) var B = flip(truecoin ? .5 : .9) var C = flip(truecoin ? .5 : .9) condition(A + B == 2) return C } viz(Infer({method:'enumerate'}, model))

c)

the coin was biased 85% of the time (proof: change above condition to A + B + C == 3 and return truecoin)

d)

the coin will be heads 60% of the time (proof same as part c)

#### 2

a)

intervention and conditioning are the same in this scenario because there is only one observation + lungCancer and cold are conditionally independent; i.e. setting one to true doesn't give us information about the other

b)

var model = function() { var smoker = flip(.15) var yellowhands = (smoker ? flip(.95) : flip(.05)) condition(yellowhands) return {'smoke': smoker, 'yellow': yellowhands} } var model2 = function() { var smoker = flip(.15) var yellowhands = flip(.999) return {'smoke': smoker, 'yellow': yellowhands} } viz(Infer({method:'enumerate'}, model)) viz(Infer({method:'enumerate'}, model2))

conditioning on situations where people have yellow hands allows you to infer the likelihood they are smokers; it does *not* affect the likelihood that the person is a smoker if you intervene and dye their hands yellow / otherwise adjust the prior of the causal effect of smoking.

#### 3

a)

possible are not a or b, a but not b, b but not a, and a and b. of these, three satisfy a || b, and of those a is true 2/3 of the time.

b)

p(alice is nice) = 0.7 p(alice smiles | she is nice) = 0.8 p(alice smiles | she is not nice) = 0.5 p(alice smiles) = (p(alice smiles|she is nice)p(alice is nice) + p(alice smiles|she is not nice)p(alice is not nice)) = 0.71 p(alice is nice | she smiles) = (p(alice smiles | she is nice) p(alice is nice)) / p(alice smiles) = .7887

p(alice is nice | she smiles *again*) = … = .8886

#### 4

a)

given that alice smiles twice (and bob randomly also smiles), what is the probability that she is nice?

b)

var extendedSmilesModel = function() { var nice = mem(function(person) {return flip(.7)}); var wantsomething = function(person) {return nice(person) ? flip(.8) : flip(.3)} var smiles = function(person) { var A = nice(person) ? flip(.8) : flip(.5); var B = wantsomething(person) ? flip(.8) : flip(.5); return A || B; } return smiles('alice') } Infer({method: "enumerate"}, extendedSmilesModel)

c)

this doesn't feel particularly correct, probably revisit this later

var smilesModel = function() { var nice = mem(function(person) {return flip(.7)}); var wantsomething = function(person) {return nice(person) ? flip(.8) : flip(.3)} var smiles = function(person) { var A = nice(person) ? flip(.8) : flip(.5); var B = wantsomething(person) ? flip(.9) : flip(.2); return A || B; } condition(!smiles('bob') && !smiles('bob') && !smiles('bob') && !smiles('bob') && !smiles('bob') && smiles('bob')) return wantsomething('bob'); } viz(Infer({method: "enumerate"}, smilesModel))

#### 5

a)

var watermodel = function() { var sprinklerworked = flip() var it_rained = flip(.3) condition(sprinklerworked || it_rained) return {it_rained, sprinklerworked} } viz(Infer({method: "enumerate"}, watermodel))

p(it rained) ~= .46 p(sprinkler worked) ~= .76

b)

var watermodel = function() { var sprinklerworked = mem(function(n) {flip()}) var it_rained = mem(function() {flip(.3)}) var mylawnwet = sprinklerworked(1) || it_rained() var kelseylawnwet = sprinklerworked(2) || it_rained() condition(kelseylawnwet && mylawnwet) return it_rained() } print(Infer({method: "enumerate"}, watermodel))

p(it rained | both our lawns are wet) = 0.632

var watermodel = function() { var sprinklerworked = mem(function(n) {flip()}) var it_rained = mem(function() {flip(.3)}) var all_lawns_wet = function(num) { if (num == 0){ return 1 } else{ return (sprinklerworked(num) || it_rained()) ? all_lawns_wet(num-1) : 0 } } condition(all_lawns_wet(5) == true) return it_rained() } print(Infer({method: "enumerate"}, watermodel))

#### 6

a)

The probability that you will get a given letter given that you win the game

b)

p(win | h) = 1, .25, .11, .0625 p(h)p(win | h) = .05, .1125, .00555, .028 p(win) 0.19605 p(h | win) = .255, .573, 0.028, .143

c)

// define some variables and utility functions var checkVowel = function(letter) {return _.includes(['a', 'e', 'i', 'o', 'u'], letter);} var letterVals = ['g', 'a', 'm', 'e']; var letterProbs = map(function(letter) {return checkVowel(letter) ? 0.45 : 0.05;}, letterVals); var letters = Categorical({vs: letterVals, ps: letterProbs}) // Compute p(h | win) var distribution = Infer({method: 'enumerate'}, function() { var letter = sample(letters); var position = letterVals.indexOf(letter) + 1; var winProb = 1 / Math.pow(position, 2); condition(flip(winProb)) return letter }); viz.auto(distribution);

d)

// define some variables and utility functions var checkVowel = function(letter) {return _.includes(['a', 'e', 'i', 'o', 'u'], letter);} var letterVals = ['g', 'a', 'm', 'e']; var letterProbs = map(function(letter) {return checkVowel(letter) ? 0.45 : 0.05;}, letterVals); var letters = Categorical({vs: letterVals, ps: letterProbs}) // Compute p(h | win) var distribution = Infer({method: 'enumerate'}, function() { var letter = sample(letters); var position = letterVals.indexOf(letter) + 1; var winProb = 1 / Math.pow(position, 2); condition(flip(winProb)) return checkVowel(letter) }); viz.auto(distribution);

p(vowel | win) is higher

e)

assuming this question is with regards to bayes rule vs the webppl implementation of these? The mathematical notation is a lot more concise but the code scales easier to larger hypothesis spaces

### Dependence

I think most of these are pretty clearly either statistically dependent or not but I'll do them anyways moreso for webPPL practice; I drew the nets in a notebook since I was just doing this independently.

#### 1

a)

a and b are not causally dependent (they are both highest nodes), nor are they statistically dependent (they are both fair coins)

var questA = function(Aval) { return Infer({method: 'enumerate'}, function() { var a = flip() var b = flip() var c = flip(a && b ? .8 : .5) return {b: b} }) } viz(questA(true)) viz(questA(false))

b)

a and b are causally dependent (b calls a to evaluate), and statistically dependent (information about one provides information about the other)

var questB = function(Aval) { return Infer({method: 'enumerate'}, function() { var a = flip() var b = flip(a ? .9 : .2) var c = flip(b ? .7 : .1) condition(a == Aval) return {b: b} }) } viz(questB(true)) viz(questB(false))

c)

this is the same as exercise b, except c depends on a instead of b, which is irrelevant for the listed questions other than drawing the graph

var questC = function(Aval) { return Infer({method: 'enumerate'}, function() { var a = flip() var b = flip(a ? .9 : .2) var c = flip(a ? .7 : .1) condition(a == Aval) return {b: b} }) } viz(questC(true)) viz(questC(false))

d)

a and b are causally linked, since 50% of the time b depends on the value of a (since z depends on a 50% of the time); and the two are statistically dependent

var questD = function(Aval) { return Infer({method: 'enumerate'}, function() { var a = flip(.6) var c = flip(.1) var z = flip() ? a : c; var b = z ? 'foo' : 'bar' condition(a == Aval) return {b: b} }) } viz(questD(true)) viz(questD(false))

e)

a and b are not causally linked (alice passing does not affect whether or not bob passes) but they *are* statistically dependent since they share a common cause (i.e. whether or not the exam was fair)

var questE = function(Aval) { return Infer({method: 'enumerate'}, function() { var examFairPrior = Bernoulli({p: .8}) var doesHomeworkPrior = Bernoulli({p: .8}) var examFair = mem(function(exam) {return sample(examFairPrior)}) var doesHomework = mem(function(student) {return sample(doesHomeworkPrior)}); var pass = function(student, exam) { return flip(examFair(exam) ? (doesHomework(student) ? .9 : .5) : (doesHomework(student) ? .2 : .1)); } var a = pass('alice', 'historyExam'); var b = pass('bob', 'historyExam'); condition(a == Aval) return {b: b} }) } viz(questE(true)) viz(questE(false))

### Conditional Dependence

The readings page is broken, so I assume there are no readings associated with this relatively short homework

#### 1

Infer({method: 'enumerate'}, function() { var cancer = flip(0.00001); var cold = flip(0.2); var dead = (cancer && flip(0.9)) || (cold && flip(0.00006)) || flip(0.000000001) return dead });

a)

p(cancer | dead, cold) = .13 p(cancer | dead, !cold) = .999 p(cancer | dead) = .429 p(cancer) = 0.00001

An example of explaining away here is knowing that you do/do not have a cold dramatically altering the probability of cancer given that you are dead. If there's a ~40% chance you died from cancer, knowing you had a cold explains away a good amount of the evidence (i.e. observing the common cause makes them statistically dependent, which makes observing cold or !cold change the p(cancer)

var check = Infer({method: 'enumerate'}, function() { var cancer = flip(0.00001); var cold = flip(0.2); var dead = (cancer && flip(0.9)) || (cold && flip(0.00006)) || flip(0.000000001) condition(dead && cold) return cancer }); print(check)

b)

p(cold | dead, cancer) = 0.2 p(cold | dead, !cancer) = 0.999 p(cold) = 0.2 p(cold | dead) = 0.657

An example of explaining away here is knowing that you do/do not have cancer dramatically altering the probability of having a cold given that you are dead. If there's a ~66% chance you died from a cold, knowing you had cancer explains away virtually all of the evidence (i.e. observing the common cause makes them statistically dependent, which makes observing cancer or !cancer change the p(cold)

var check = Infer({method: 'enumerate'}, function() { var cancer = flip(0.00001); var cold = flip(0.2); var dead = (cancer && flip(0.9)) || (cold && flip(0.00006)) || flip(0.000000001) condition(dead && !cancer) return cold }); print(check)

### Bayesian Data Analysis

Again, it seems there are no readings for this chapter.

#### 1

a)

Beta({a: 10,b: 10}) captures a gaussian-like distribution around 0.5

Beta({a: 1, b: 5}) captures a left-dense distribution with a long tail extending to higher values

Beta({a: 0.1, b: 0.1}) is a U like shape with two dense areas around 0.05 and around 0.95, pointed more towards 0.95

These three generate relatively variant final distributions based on the same data; the first generates a gaussianish predictive distribution around 0.5, the second and third yield exponential-ish predictive distributions.

b)

*Honestly I'm not sure I'm following; does this question just want me to change n?*

// observed data var k = 1 // number of successes var n = 20 // number of attempts var n2 = 5 //experiment 2 var priorDist = Beta({a:1, b:1}); var model = function() { var p = sample(priorDist); // Observed k number of successes, assuming a binomial observe(Binomial({p : p, n: n}), k); // sample from binomial with updated p var posteriorPredictive = binomial(p, n2); // sample fresh p (for visualization) var prior_p = sample(priorDist); // sample from binomial with fresh p (for visualization) var priorPredictive = binomial(prior_p, n); return { prior: prior_p, priorPredictive : priorPredictive, posterior : p, posteriorPredictive : posteriorPredictive }; } var opts = {method: "MCMC", samples: 2500, lag: 50}; var posterior = Infer(opts, model); viz.marginals(posterior)

#### 2

a)

Most of the probability mass is collecting in lower areas so it seems more sensible to proceed with data collection unless changing it would be a very easy exercise or collecting another person's results would be prohibitively expensive.

b)

the straightforward answer is that we don't actually have any evidence at all that the task is hard (other than just our prior), but we *do* have evidence that the task is fairly easy. MAP might be more useful in situations where we have enough data to converge upon a good value, but here the MAP is so high merely because we were way overconfident that the task was too hard.

From the metaphor of theory/model evaluation, it seems more effective to consider a model's output as where it puts the most density; if probabilities are your racehorse bet money, then you'll make the most money if most of your money is in correct places, compared to if the outcome you placed the most money on wins. So if you imagine you can bet on horses from two teams: "team impossible" and "team ezpz". you think "Show me the money" on team impossible is the favorite, but that doesn't necessarily mean you should automatically beat on team impossible, since it's possible that the second most likely horse to win is on ezpz (and for that matter, the third and fourth and fifth and sixth…) Picking MAP is all well and good for well-formed distributions but for these divergent/multimodal cases it's a lot more unclear and you need to weigh the pros and cons.

#### 3

a)

If alice and bob win as a team, you might think alice is strong, but if bob wins by himself later you can explain away alice's skill somewhat

b)

one thing that comes to mind is when players are stronger as a unit than they are separately, seems relatively unmodeled here.

c)

lazy pulling dividing your strength in half can definitely be changed to a parameter since that's very extreme

d)

(sort of confused by the wording?) from WebPPL's perspective these are both just models with parameters that can be inferred from data, so in that sense webppl can pull out these values to compare them(?)

#### 4

a)

the parameters of this model are

- probability an object is a blicket
- probability that a blicket sets off the machine
- probability that a nonblicket sets off the machine
- probability that the machine randomly goes off

b)

the infer statement in dataAnalysis returns the posterior distribution over parameters the infer statement in detectingBlickets returns the posterior predictive distribution there are two queries in this program because you need to know what data to expect as well as what values are most likely in the model

c)

blicketbaserate = the likely probability that an object is a blicket (gaussian centered around 0.38) blicketpower = the probability that a blicket will turn on the machine (roughly linear) nonblicketpower = the probability that a nonblicket turns on the machine (distribution roughly exponential) machineSpontaneouslygoesoff = the probability the machine turns on randomly (also roughly exponential) predictive cond1 = p(A) (centered around 0.8) predictive cond2 = p(A, B) (centered around 0.55) predictive cond3 = p(A,B; B) (centered around 0.45) predictive cond4 = p(A,B; A,B) (centered around 0.66) predictive cond5 = p([]) (centered around 0.35) scatter = predictive summary comparing data analysis and summary of data

d)

They have the most density around the original values, but with different probability distributions

e)

both of these were assumed to be uniform from 0 to 1 before running the experiment, which isn't entirely necessary; it would have probably been justified to assume that blicket detectors are better than chance at activating when blickets are placed upon it, and that non-blickets probably do not usually set of the detector. Imposing true uniformity upon these two assumes all possible configurations of blicket detectors are equally likely, including an "anti-blicket-detector" in which non-blickets almost always turn on the blicket detector, and blickets never do.

The posteriors directly tell us that this is likely an unnecessary (read: wrong) assumption; the blicket power is very high, the non-blicket power is very low, but the model arrived at this conclusion anyways since it was given enough data to learn it itself.

f)

the maximum a-posteriori value is not particularly well modeled by the posterior predictive; and the scatter plot runs upwards but doesn't really fit the y = x line. This would suggest to us that the MAP is not a well-predicted point estimate of the posterior predictive distribution, which is a negative point to analyzing our model-data fit.

g)

//fold: ... var params = { blicketBaseRate : .38, blicketPower: .95, nonBlicketPower: .01, machineSpontaneouslyGoesOff: .02 }; var bestFitModelPredictions = map(function(evidence) { return Math.exp(detectingBlickets(evidence, params).score(true)); }, possibleEvidenceStream) viz.scatter(bestFitModelPredictions, dataSummary(data))

h)

The fit is better if we look at MAP values directly, and the previous values seem to be converging upon these values anyways (using MAP here just seems to skip to the end, so to speak). There seems to be some relationships that are not yet properly captured (but might be with better tuning and/or more data)

### Algorithms for Inference

#### 1

a)

// takes z = 0 cross section of heart surface to some tolerance // see http://mathworld.wolfram.com/HeartSurface.html var onCurve = function(x, y) { var x2 = x*x; var term1 = y - Math.pow(x2, 1/3); var crossSection = x2 + term1*term1 - 1; return Math.abs(crossSection) < 0.01; }; var xbounds = [-1, 1]; var ybounds = [-1, 1.6]; var xmu = 0.5 * (xbounds[0] + xbounds[1]); var ymu = 0.5 * (ybounds[0] + ybounds[1]); var xsigma = 0.5 * (xbounds[1] - xbounds[0]); var ysigma = 0.5 * (ybounds[1] - ybounds[0]); var model = function() { var x = gaussian(xmu, xsigma); var y = gaussian(ymu, ysigma); condition(onCurve(x, y)); return {x: x, y: y}; }; var post = Infer({method: 'MCMC', kernel: 'MH', samples: 50000}, model); viz.auto(post);

Doesn't work as well because MH samples from gaussians around the current point, so all of the points come close to each other and don't grab the full space of the curve as efficiently as rejection sampling does. Once MH finds a point, it will fix it in place and look only nearby, which is why instead of getting the full heart (through truer random) you get patchy clusters.

b)

// takes z = 0 cross section of heart surface to some tolerance // see http://mathworld.wolfram.com/HeartSurface.html var onCurve = function(x, y) { var x2 = x*x; var term1 = y - Math.pow(x2, 1/3); var crossSection = x2 + term1*term1 - 1; return Math.abs(crossSection) < 0.01; }; var xbounds = [-1, 1]; var ybounds = [-1, 1.6]; var xmu = 0.5 * (xbounds[0] + xbounds[1]); var ymu = 0.5 * (ybounds[0] + ybounds[1]); var xsigma = 0.5 * (xbounds[1] - xbounds[0]); var ysigma = 0.5 * (ybounds[1] - ybounds[0]); var model = function() { var pt = Vector([xmu, ymu]); var sg = Vector([xsigma, ysigma]) var b = sample(DiagCovGaussian({mu: pt, sigma: sg})) var x = T.get(b, 0) var y = T.get(b, 1) condition(onCurve(x, y)); return {x: x, y: y}; }; var post = Infer({method: 'MCMC', kernel: 'MH', samples: 50000}, model); viz.auto(post);

c)

// takes z = 0 cross section of heart surface to some tolerance // see http://mathworld.wolfram.com/HeartSurface.html var onCurve = function(x, y) { var x2 = x*x; var term1 = y - Math.pow(x2, 1/3); var crossSection = x2 + term1*term1 - 1; return Math.abs(crossSection) < 0.01; }; var xbounds = [-1, 1]; var ybounds = [-1, 1.6]; var xmu = 0.5 * (xbounds[0] + xbounds[1]); var ymu = 0.5 * (ybounds[0] + ybounds[1]); var xsigma = 0.5 * (xbounds[1] - xbounds[0]); var ysigma = 0.5 * (ybounds[1] - ybounds[0]); var model = function() { var x = gaussian(xmu, xsigma); var y = gaussian(ymu, ysigma); condition(onCurve(x, y)); return {x: x, y: y}; }; var post = Infer({method: 'MCMC', kernel: {HMC: {steps: 25, stepSize: .05}}, samples: 10000}, model); viz.auto(post);

HMC seems to work better because it's less local and also feasible for the general differentiability of the model in question. It's more spread around the heart because the proposals made to the random choices are coordinated with respect to the gradient, rather than searching for points on the curve by leveraging points we already know to be on the curve (as in MH)

#### 2

a)

var interpolate = function(point1, point2, interpolationWeight) { return (point1 * interpolationWeight + point2 * (1 - interpolationWeight)) } var model = function(){ var point1 = -10; var point2 = uniform(-100,100); var interpolationWeight = uniform(0,1); var pointInMiddle = interpolate(point1, point2, interpolationWeight); observe(Gaussian({mu: 0, sigma:0.1}), pointInMiddle) return {point2, interpolationWeight, pointInMiddle} } var posterior = Infer({ method: 'MCMC', samples: 5000, lag: 100, }, model) var samples = posterior.samples; viz(marginalize(posterior, function(x) {return x.pointInMiddle})) viz(marginalize(posterior, function(x) {return x.point2})) viz(marginalize(posterior, function(x) {return x.interpolationWeight})) viz(marginalize(posterior, function(x) {return {2: x.point2, i: x.interpolationWeight}})) // Store these for future use editor.put("posterior", posterior) editor.put("samples", samples)

These two variables are correlated

b)

var interpolate = function(point1, point2, interpolationWeight) { return (point1 * interpolationWeight + point2 * (1 - interpolationWeight)) } var model = function(){ var point1 = -10; var point2 = uniform(-100,100); var interpolationWeight = uniform(0,1); var pointInMiddle = interpolate(point1, point2, interpolationWeight); observe(Gaussian({mu: 0, sigma:0.1}), pointInMiddle) return {point2, interpolationWeight, pointInMiddle} } var posterior = Infer({ method: 'MCMC', samples: 50, lag: 0, }, model) var samples = posterior.samples; var middlepoint = function(d) { return d["value"]["pointInMiddle"] } var s = map(middlepoint, samples) viz.line(_.range(s.length), s) // Store these for future use editor.put("posterior", posterior) editor.put("samples", samples)

It will start someplace random but then converge to 0.0

c)

var interpolate = function(point1, point2, interpolationWeight) { return (point1 * interpolationWeight + point2 * (1 - interpolationWeight)) } var model = function(){ var point1 = -10; var point2 = uniform(-100,100); var interpolationWeight = uniform(0,1); var pointInMiddle = interpolate(point1, point2, interpolationWeight); condition(Math.abs(pointInMiddle) < 0.1) return {point2, interpolationWeight, pointInMiddle} } var posterior = Infer({ method: 'rejection', samples: 1000 }, model) viz(marginalize(posterior, function(x) {return x.pointInMiddle}))

It doesn't really work that well, the distribution isn't as nice and it takes way longer since it's unlikely to get a value close to 0 if you pick things randomly.

d)

something close to 1.5% of the samples are being accepted, which makes sense since if we fit around -10 and we want the middle point to be close to 0 then it seems unlikely that searching for points near point1 will give us the right answer; using drift kernel for point2 like uniformdrift seems like it would work better since we can limit where in the distribution we look (I think? the details a little unclear to me).

var interpolate = function(point1, point2, interpolationWeight) { return (point1 * interpolationWeight + point2 * (1 - interpolationWeight)) } var model = function(){ var point1 = -10; var point2 = uniformDrift({a: -100, b: 100, w: 20}); var interpolationWeight = uniform(0,1); var pointInMiddle = interpolate(point1, point2, interpolationWeight); observe(Gaussian({mu: 0, sigma:0.1}), pointInMiddle) return {point2, interpolationWeight, pointInMiddle} } var posterior = Infer({ method: 'MCMC', Kernel: 'MH', verbose: true, samples: 5000, lag: 100, }, model) var samples = posterior.samples; viz(marginalize(posterior, function(x) {return x.pointInMiddle})) // Store these for future use editor.put("posterior", posterior) editor.put("samples", samples)

well, whatever it's doing, it's helping a lot!

### Learning as conditional inference

#### 1

a)

var weightPosterior = function(observedData){ return Infer({method: 'MCMC', burn:1000, samples: 50000}, function() { var fakeweight = flip(.95) ? 1 : uniform({a: 0, b: 1}); var realWeight = flip(.999) ? .5 : fakeweight var coin = Bernoulli({p: realWeight}) var obsFn = function(datum){observe(coin, datum=='h')} mapData({data: observedData}, obsFn) return realWeight }) } var fullDataSet = repeat(50, function(){return 'h'}); var observedDataSizes = [0,1,2,4,6,8,10,12,15,20,25,30,40,50]; var estimates = map(function(N) { return expectation(weightPosterior(fullDataSet.slice(0,N))) }, observedDataSizes); viz.line(observedDataSizes, estimates);

b)

It more quickly converges upon a weight of 1 compared to before, where it would merely approach 1 over time, which imo matches the intuitive version of this problem much closer.

repeat(50, function(){return flip(.8) ? 'h' : 't'});

However, the naive uniform prior will more quickly learn if the true value of the coin is not 1, taking closer to 25 flips to start learning rather than taking 40.

#### 2

a)

var pseudoCounts = {a: 10, b: 10}; var weightPosterior = function(observedData){ return Infer({method: 'MCMC', burn:1000, samples: 1000}, function() { var coinWeight = beta(pseudoCounts) var coin = Bernoulli({p: coinWeight}) var obsFn = function(datum){observe(coin, datum=='h')} mapData({data: observedData}, obsFn) return coinWeight }) } //creating 50 pairs of 'h' and 't' alternating var fullDataSet = repeat(50,function(){['h', 't']}).flat() var prior = weightPosterior(['h', 't']) //unsure if better way to viz prior var post = weightPosterior(fullDataSet) viz(prior) //should graph the prior distribution on weights viz(post) //should graph the posterior distribution on weights

learning is occurring; it's just learning that the stdev around the mean is much, much tighter (since the distribution is so consistently 50/50) and thereby dramatically reduces the density around all points aside from the mean.

b)

var pseudoCounts = {a: 10, b: 10}; var weightPosterior = function(observedData){ return Infer({method: 'MCMC', burn:1000, samples: 1000}, function() { var coinWeight = beta(pseudoCounts) var coin = Bernoulli({p: coinWeight}) var obsFn = function(datum){observe(coin, datum=='h')} mapData({data: observedData}, obsFn) return coinWeight }) } //creating 50 pairs of 'h' and 't' alternating var fullDataSet = repeat(50,function(){['h', 't']}).flat() var observedDataSizes = [0,2,4,8,16,32,64,128,256,512]; var posts = map(function(N) { return weightPosterior(fullDataSet.slice(0,N)) }, observedDataSizes); // returns an array of posteriors of length observedDataSizes.length var variances = map(function(p){ var x = expectation(p) var y = expectation(p, function(z){return Math.pow(z-x, 2)}) return y }, posts) viz.line(observedDataSizes, variances);

#### 3

a)

var observedData = [{C:true, E:true}, {C:true, E:true}, {C:false, E:false}, {C:false, E:false}, {C:true, E:true}]

tightly correlated, likely to not occur unless C also occurs

b)

var observedData = [{C:true, E:true}, {C:true, E:true}, {C:false, E:true}, {C:false, E:true}, {C:true, E:true}]

not very correlated, and E is always true in the set of observations; high background, low explanatory power for C

c)

var observedData = [{C:true, E:true}, {C:true, E:true}, {C:false, E:false}, {C:false, E:true}, {C:true, E:true}, {C:false, E:true}, {C:true, E:true}]

correlated, but not 1:1, E is usually true anyways but C provides good information about when E is likely to be true and thereby has high causal power despite E being high background probability anyways.

d)

this will fail if E is always true regardless of whether or not C is true;

var observedData = [{C:true, E:true}, {C:true, E:true}, {C:false, E:true}, {C:false, E:true}, {C:true, E:true}, {C:true, E:true}, {C:false, E:true}, {C:true, E:true}]

this gives roughly uniform probabilities for causal power, but will only get worse as you add more C: false conditions

### Learning with a Language of Thought

#### 1

a)

because the probability of constant or x are the same, but among constants it could be any number between 0 and 9, so the probability is divided among all of them.

b)

the second one is x^{x}

c)

Make it less likely for exponentiation to include a non-constant, I guess.

#### 2

I'll do this later lol

### Hierarchical Models

#### 1

a)

var colors = ['black', 'blue', 'green', 'orange', 'red']; var observedData = [ {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'blue'}, {bag: 'bag1', draw: 'black'}] // first model: set alpha = [1, 1, 1, 1, 1] and observe `observedData` var observed = Infer({method: 'MCMC', samples: 20000}, function(){ var makeBag = mem(function(bag){ var colorProbs = dirichlet(ones([colors.length, 1])) return Categorical({vs: colors, ps: colorProbs}) }) var obsFn = function(datum){ observe(makeBag(datum.bag), datum.draw) } mapData({data: observedData}, obsFn) return {bag1: sample(makeBag('bag1'))} }) viz.marginals(observed) // second model. Set alpha = [2, 3, 1, 1, 1] var usealpha = Infer({method: 'MCMC', samples: 20000}, function(){ var makeBag2 = mem(function(bag){ var colorProbs = dirichlet(Vector([2, 3, 1, 1, 1])) return Categorical({vs: colors, ps: colorProbs}) }) var obsFn = function(datum){ observe(makeBag2(datum.bag), datum.draw) } mapData({data: observedData}, obsFn) return {bag1: sample(makeBag2('bag1'))} }) viz.marginals(usealpha) // should roughly match first figure

#### 2

a)

// your code here var makeBarrel = mem(function(barr){ var dist = beta({a: .1, b: .2}) return function(N){ return repeat(N, function(){flip(dist)}) } }) // Do not edit this function: it tests your code var post = Infer({method: 'forward'}, function(){ var abarrel = makeBarrel('b') return Math.sum(abarrel(10)) }) viz(post)

b)

var makeStore = mem(function(store){ var goodstore = flip() ? beta({a: .1, b: .2}) : beta({a: .3, b: .1}) return mem(function(N){ var dist = goodstore return function(z){ return repeat(z, function(){flip(dist)}) } }) }) // Following code inspects your functions viz(Infer({method: 'forward', samples:10000}, function(){ var S = makeStore('S') var B1 = S('B1') var B2 = S('B2') return Math.abs(Math.sum(B1(10))-Math.sum(B2(10))) })) // should generally be little difference viz(Infer({method: 'forward', samples:10000}, function(){ var S1 = makeStore('S1') var S2 = makeStore('S2') var B1 = S1('B1') var B2 = S2('B2') return Math.abs(Math.sum(B1(10))-Math.sum(B2(10))) })) // difference should be larger on average

c)

var makeCity = mem(function(city){ var applecity = flip() ? beta({a: .8, b: .2}) : beta({a: .1, b: .3}) return mem(function(store){ var goodstore = flip(applecity) ? beta({a: .1, b: .2}) : beta({a: .3, b: .1}) return mem(function(N){ var dist = goodstore return function(z){ return repeat(z, function(){flip(dist)}) } }) }) }) //Make sure the following code runs: var C1 = makeCity("C1") var S1 = C1("S1") var B1 = S1("B1") viz(Infer({method: 'forward'}, function(){ return Math.sum(B1(10)) })) //repeat to see different kinds of cities

d)

var makeCity = mem(function(city){ var applecity = flip() ? beta({a: .8, b: .2}) : beta({a: .1, b: .3}) return mem(function(store){ var goodstore = flip(applecity) ? beta({a: .1, b: .2}) : beta({a: .3, b: .1}) return mem(function(N){ var dist = goodstore return function(z){ return repeat(z, function(){flip(dist)}) } }) }) }) var C1 = makeCity("C1") var S1 = C1("S1") var B1 = S1("B1") var S2 = C1("S2") var B2 = S2("B2") viz(Infer({method: 'forward'}, function(){ condition(Math.sum(B1(10)) == 7) return Math.sum(B2(10)) }))

#### 3

a)

This feels really inelegant and I'm not sure it's right; but it seems vowel-words actually take longer to read when you account for this.

var data = [{group: "vowel", word: "abacus", id: 1, rt: 210}, {group: "vowel", word: "abacus", id: 2, rt: 212}, {group: "vowel", word: "abacus", id: 3, rt: 209}, {group: "vowel", word: "aardvark", id: 1, rt: 200}, {group: "vowel", word: "aardvark", id: 2, rt: 201}, {group: "vowel", word: "aardvark", id: 3, rt: 198}, {group: "vowel", word: "ellipse", id: 1, rt: 220}, {group: "vowel", word: "ellipse", id: 2, rt: 222}, {group: "vowel", word: "ellipse", id: 3, rt: 219}, {group: "consonant", word: "proton", id: 1, rt: 190}, {group: "consonant", word: "proton", id: 2, rt: 191}, {group: "consonant", word: "proton", id: 3, rt: 189}, {group: "consonant", word: "folder", id: 1, rt: 180}, {group: "consonant", word: "folder", id: 2, rt: 182}, {group: "consonant", word: "folder", id: 3, rt: 178}, {group: "consonant", word: "fedora", id: 1, rt: 230}, {group: "consonant", word: "fedora", id: 2, rt: 231}, {group: "consonant", word: "fedora", id: 3, rt: 228}, {group: "consonant", word: "fedora", id: 1, rt: 231}, {group: "consonant", word: "fedora", id: 2, rt: 233}, {group: "consonant", word: "fedora", id: 3, rt: 230}, {group: "consonant", word: "fedora", id: 1, rt: 230}, {group: "consonant", word: "fedora", id: 2, rt: 232}, {group: "consonant", word: "fedora", id: 3, rt: 228}] var post = Infer({method: "MCMC", kernel: {HMC: {steps: 10, stepSize: 1}}, samples: 10000}, function(){ var groupMeans = {vowel: gaussian(200, 100), consonant: gaussian(200, 100)} var wordMeans = {abacus: gaussian(groupMeans['vowel'] - 100, groupMeans['vowel'] + 100), aardvark: gaussian(groupMeans['vowel'] - 100, groupMeans['vowel'] + 100), ellipse: gaussian(groupMeans['vowel'] - 100, groupMeans['vowel'] + 100), proton: gaussian(groupMeans['consonant'] - 100, groupMeans['consonant'] + 100), folder: gaussian(groupMeans['consonant'] - 100, groupMeans['consonant'] + 100), fedora: gaussian(groupMeans['consonant'] - 100, groupMeans['consonant'] + 100) } var obsFn = function(d){ //assume response times (rt) depend on group means with a small fixed noise: observe(Gaussian({mu: wordMeans[d.word], sigma: 5}), d.rt) } mapData({data: data}, obsFn) //explore the difference in means: return groupMeans['vowel']-groupMeans['consonant'] }) print("vowel - consonant reading time:") viz(post) print(expectation(post))

b)

var data = [{group: "vowel", word: "abacus", id: 1, rt: 210}, {group: "vowel", word: "abacus", id: 2, rt: 212}, {group: "vowel", word: "abacus", id: 3, rt: 209}, {group: "vowel", word: "aardvark", id: 1, rt: 200}, {group: "vowel", word: "aardvark", id: 2, rt: 201}, {group: "vowel", word: "aardvark", id: 3, rt: 198}, {group: "vowel", word: "ellipse", id: 1, rt: 220}, {group: "vowel", word: "ellipse", id: 2, rt: 222}, {group: "vowel", word: "ellipse", id: 3, rt: 219}, {group: "consonant", word: "proton", id: 1, rt: 190}, {group: "consonant", word: "proton", id: 2, rt: 191}, {group: "consonant", word: "proton", id: 3, rt: 189}, {group: "consonant", word: "folder", id: 1, rt: 180}, {group: "consonant", word: "folder", id: 2, rt: 182}, {group: "consonant", word: "folder", id: 3, rt: 178}, {group: "consonant", word: "fedora", id: 1, rt: 230}, {group: "consonant", word: "fedora", id: 2, rt: 231}, {group: "consonant", word: "fedora", id: 3, rt: 228}, {group: "consonant", word: "fedora", id: 1, rt: 231}, {group: "consonant", word: "fedora", id: 2, rt: 233}, {group: "consonant", word: "fedora", id: 3, rt: 230}, {group: "consonant", word: "fedora", id: 1, rt: 230}, {group: "consonant", word: "fedora", id: 2, rt: 232}, {group: "consonant", word: "fedora", id: 3, rt: 228}] var post = Infer({method: "MCMC", kernel: {HMC: {steps: 10, stepSize: 1}}, samples: 10000}, function(){ var groupMeans = {vowel: gaussian(200, 100), consonant: gaussian(200, 100)} var gmminv = groupMeans['vowel'] - 100 var gmmaxv = groupMeans['vowel'] + 100 var gmminc = groupMeans['consonant'] - 100 var gmmaxc = groupMeans['consonant'] + 100 var idMeans = {"1": gaussian((gmminv+gmminc)/2, (gmmaxv+gmmaxc)/2), "2": gaussian((gmminv+gmminc)/2, (gmmaxv+gmmaxc)/2), "3": gaussian((gmminv+gmminc)/2, (gmmaxv+gmmaxc)/2), } var wordMeans = {abacus: gaussian(gmminv, gmmaxv), aardvark: gaussian(gmminv, gmmaxv), ellipse: gaussian(gmminv, gmmaxv), proton: gaussian(gmminc, gmmaxc), folder: gaussian(gmminc, gmmaxc), fedora: gaussian(gmminc, gmmaxc), } var obsFn = function(d){ //assume response times (rt) depend on group means with a small fixed noise: observe(Gaussian({mu: idMeans[d.id], sigma: 5}), d.rt) observe(Gaussian({mu: wordMeans[d.word], sigma: 5}), d.rt) } mapData({data: data}, obsFn) //explore the difference in means: return groupMeans['vowel']-groupMeans['consonant'] }) print("vowel - consonant reading time:") viz(post) print(expectation(post))

This seems to make the conclusion weaker, suggesting it might've just been one/a few participants much worse at reading vowel words compared to consonant words.

### Occam's Razor

#### 1

a)

There are comparably fewer possibilities in the set of numbers in powers_{of}_{3} (i.e. 0, 3, 9, 27, 81…) compared to multiples of one (all natural numbers), multiples of 3 (1/3 of the natural numbers), or odds (1/2 the natural numbers). Since the size principle dictates that the most constrained hypothesis is probably correct, observing a 3 is implicit negative evidence against the other categories and thereby gets the bulk of the predictive weight.

Note: "fewer possibilities" is probably not the right way to say this, since there exists a bijection between powers of 3 and the natural numbers meaning their cardinality is the same. What is important, here, is that sampling a random number is quite a bit less likely to be a power of 3 than an odd number, and that's really what we care about here (not the size of the sets)

b)

///fold: var filterByInRange = function(set) { var inRange = function(v) {v <= 100 && v >= 0}; return _.uniq(filter(inRange, set)) } var genEvens = function() { return filter(function(v) {return v % 2 == 0}, _.range(1, 101)) } var genOdds = function() { return filter(function(v) {return (v + 1) % 2 == 0}, _.range(1, 101)) } var genMultiples = function(base) { var multiples = map(function(v) {return base * v}, _.range(100)) return filterByInRange(multiples) } var genPowers = function(base) { var powers = map(function(v) {return Math.pow(base, v)}, _.range(100)) return filterByInRange(powers) } var inSet = function(val, set) { return _.includes(set, val) } /// // TODO: add a condition to this function that // calls genSetFromInterval with the parameters extracted from // your hypothesis string. // *Hint*: If you're having trouble converting fron strings to integers try the lodash function _.parseInt(). var getSetFromHypothesis = function(rule) { var parts = rule.split('_') return (parts[0] == 'multiples' ? genMultiples(parts[2]) : parts[0] == 'powers' ? genPowers(parts[2]) : parts[0] == 'evens' ? genEvens() : parts[0] == 'odds' ? genOdds() : parts[0] == 'between' ? genSetFromInterval(_.parseInt(parts[1]), _.parseInt(parts[3])) : console.error('unknown rule' + rule)) }; // TODO: this function should construct the interval // of integers between the endpoints a and b var genSetFromInterval = function(a, b) { return _.range(a, b+1) } var makeRuleHypothesisSpace = function() { var multipleRules = map(function(base) {return 'multiples_of_' + base}, _.range(1, 12)) var powerRules = map(function(base) {return 'powers_of_' + base}, _.range(1, 12)) return multipleRules.concat(powerRules).concat(['evens', 'odds']) } // TODO: build a list of all possible hypothesis intervals between 1 and 100. var makeIntervalHypothesisSpace = function() { // Note: Don't change start and end. var start = 1 var end = 100 // Your code here... // *Hint* Make sure to model this after makeRuleHypothesisSpace, which returns a list of strings that are // parsed in getSetFromHypothesis. E.g. Think of a format like 'between_a_and_b'. var pairs = _.flatten(map(function(st) { return map(function(en) { return [st, en]; }, genSetFromInterval(st+1, end)) }, genSetFromInterval(start, end))) return map(function(x){ return "between_" + x[0] + "_and_" + x[1] }, pairs) } // Takes an unordered array of examples of a concept in the number game // and also a test query (i.e. a new number that the experimenter is asking about) var learnConcept = function(examples, testQuery) { Infer({method: 'enumerate'}, function() { var rules = makeRuleHypothesisSpace() // TODO: build space of intervals var intervals = makeIntervalHypothesisSpace() // TODO: implement a hypothesis prior that first assigns probability *lambda* to rules // and (1- lambda) to intervals, then samples uniformly within each class var lambda = 0.5 var hypothesis = flip(lambda) ? uniformDraw(rules) : uniformDraw(intervals) var set = getSetFromHypothesis(hypothesis) mapData({data: examples}, function(example) { // note: this likelihood corresponds to size principle observe(Categorical({vs: set}), example) }) return {hypothesis, testQueryResponse : inSet(testQuery, set)} }); } var examples = [3] var testQuery = 12 var posterior = learnConcept(examples, testQuery) marginalize(posterior, function(x) {return x.hypothesis})

c)

The probabilities are there, but very small in comparison to the simpler ones for [3] The probabilities are about half as much for the multiples of 3 one [3, 6, 9] The range possibilities are the only ones left for the last one

d)

People seem to care a good amount about digit-wise patterns compared to the machine (i.e. "ends in 3")

#### 2

a)

var observedData = [{C:true, E:false}] var causalPost = Infer({method: 'MCMC', samples: 10000, lag:2}, function() { // Is there a causal relation between C and E? var causalrelation = flip() // Causal power of C to cause E var cp = uniform(0, 1) // Background probability of E var b = uniform(0, 1) mapData({data: observedData}, function(datum) { // The noisy causal relation to get E given C var caused = causalrelation ? (datum.C && flip(cp)) : false var E = caused || flip(b) condition(E == datum.E) }) return {causalrelation, b, cp} }) viz.marginals(causalPost)

b)

var observedData = [{C:true, E:false}] var causalPost = Infer({method: 'MCMC', samples: 10000, lag:2}, function() { // Is there a causal relation between C and E? var causalrelation = flip() // Causal power of C to cause E var cp = uniform(0, 1) // Background probability of E var b = uniform(0, 1) var noisyOrMarginal = function(pt){ return Infer({method:"enumerate"}, function(){ var caused = causalrelation ? (pt && flip(cp)) : false return caused || flip(b) }) } mapData({data: observedData}, function(datum) { observe(noisyOrMarginal(datum.C),datum.effect) }) return {causalrelation, b, cp} }) viz.marginals(causalPost)

d)

casual selection biases the model towards believing there is no relation, since that is a simpler explanation and thereby gets the most weight via bayes' occams razor. Causal power assumes there already is a relation and merely estimates the parameter from something equivalent to no relationship, which means it will be biased into moving more quickly.

### Mixture Models

#### 1

a)

///fold: var expectationOver = function(results, group) { return function(property) { return expectation(results, function(v) {return v[group][property]}) } } /// var properties = ['antennae', 'green', 'blarghNoise'] var data = [ {antennae : false, green: false, blarghNoise: false}, {antennae : true, green: true, blarghNoise: true}, {antennae : true, green: true, blarghNoise: true}, {antennae : true, green: true, blarghNoise: true}, {antennae : false, green: false, blarghNoise: false}, {antennae : true, green: true, blarghNoise: true}, {antennae : false, green: false, blarghNoise: false}, {antennae : true, green: true, blarghNoise: true}, {antennae : false, green: false, blarghNoise: false}, {antennae : false, green: false, blarghNoise: false} ] // Todo: sampleGroupPrototype takes a group and returns an object // with property / probability pairs. E.g. {antannae: 0.2, green: 0.3, blarghNoise: 0.9} // *Hint* lodash _.zipObject is useful for building dictionaries! var sampleGroupPrototype = mem(function(groupName) { var attributes = ['antennae', 'green', 'blarghNoise'] var distros = repeat(3, function(){ return beta({a: 1, b: 1})}) return _.zipObject(attributes, distros) }) var results = Infer({method: 'MCMC', kernel: {HMC: {steps: 10, stepSize: .01}}, samples: 3000}, function(){ mapData({data: data}, function(datum) { var alienName = flip() ? 'group1' : 'group2' var prototype = sampleGroupPrototype(alienName) mapData({data: properties}, function(property) { observe(Bernoulli({p: prototype[property]}), datum[property]) }) }) return {group1: sampleGroupPrototype('group1'), group2: sampleGroupPrototype('group2')} }) viz.bar(properties, map(expectationOver(results, 'group1'), properties)) viz.bar(properties, map(expectationOver(results, 'group2'), properties))

b)

It's highly likely that a noisy alien sound comes from an alien which is green with antennae, although that's not always the case and could be a rude assumption depending on the specific culture of the alien planet.

#### 2

a)

var scores = [45, 45, 44, 45, 44, 45, 45, 45, 45, 45, 30, 20, 6, 44, 44, 27, 25, 17, 14, 27, 35, 30] var subjIDs = _.range(scores.length) var data = map(function(datum) {return _.zipObject(['subjID', 'score'], datum)}, _.zip(subjIDs, scores)); var inferOpts = {method: 'MCMC', samples: 10000} var results = Infer(inferOpts, function() { var good_prob = uniform(0.7, 1) var cheat_prob = uniform(0, 0.7) var cheater = mem(function(name){ return flip() }) var obsFn = function(datum){ var error = cheater(datum["subjID"]) ? cheat_prob : good_prob observe(Binomial({p: error, n: 45}), datum["score"]) } mapData({data: data}, obsFn) var who_cheated = map(function(datum){return cheater(datum["subjID"])}, data) return who_cheated }) viz.marginals(results)

b)

Definitely seems like not everybody followed instructions, you can tune the parameters a bit but if some of the malingerers got 44s I doubt much tuning is going to filter them out.

### Social Cognition

#### 1

a)

var actionPrior = Categorical({vs: ['a', 'b', 'c'], ps: [1/3, 1/3, 1/3]}); var foodPrior = Categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [1/3, 1/3, 1/3]}); var vendingMachine = function(state, action) { return (action == 'a' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.8, .1, .1]}) : action == 'b' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .8, .1]}) : action == 'c' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .1, .8]}) : 'nothing'); } var chooseAction = function(goal, transition, state, deceive) { return Infer({method: 'enumerate'}, function() { var action = sample(actionPrior); condition(deceive ? !goal(transition(state, action)) : goal(transition(state, action))) return action; }) }; var goalPosterior = Infer({method: 'enumerate'}, function() { var deceive = flip(); var goalFood = sample(foodPrior); var goal = function(outcome) {return outcome == goalFood} var sallyActionDist = chooseAction(goal, vendingMachine, 'cookie', deceive); condition(sample(sallyActionDist) == 'b'); return goalFood; }); viz.auto(goalPosterior);

b)

change the condition in goalPosterior

condition(sample(sallyActionDist) == 'b' && sample(sallyActionDist) == 'b');

if she does 'b' twice then she probably actually wants the cookie

if she does 'a' and then 'b' she's probably trying to hide that she wants a doughnut

#### 2

a)

var montyRandom = function(aliceDoor, prizeDoor) { return Infer({method: 'enumerate'}, function() { return sample(RandomInteger({n: 3}))+1 }) }; var montyAvoidBoth = function(aliceDoor, prizeDoor) { return Infer({method: 'enumerate'}, function() { var pick = sample(RandomInteger({n: 3}))+1 condition(pick != aliceDoor && pick != prizeDoor) return pick }); }; var montyAvoidAlice = function(aliceDoor, prizeDoor) { return Infer({method: 'enumerate'}, function() { var pick = sample(RandomInteger({n: 3}))+1 condition(pick != aliceDoor) return pick }) }; var montyAvoidPrize = function(aliceDoor, prizeDoor) { return Infer({method: 'enumerate'}, function() { var pick = sample(RandomInteger({n: 3}))+1 condition(pick != prizeDoor) return pick }) }; Infer({method: 'enumerate'}, function() { var aliceDoor = sample(RandomInteger({n: 3}))+1 var prizeDoor = sample(RandomInteger({n: 3}))+1 var montyFunction = montyAvoidPrize; var montyDoorDist = montyFunction(aliceDoor, prizeDoor); condition(aliceDoor == 1 && sample(montyDoorDist) == 3) return (prizeDoor == 2) - (prizeDoor == 1) });

Alice should switch for bothavoid but it doesnt matter for random

b)

Random

alicedoor | prizedoor | montydoor | P(alice, prize, monty) |
---|---|---|---|

1 | 1 | 1 | (1/3)^{3} |

1 | 1 | 2 | (1/3)^{3} |

1 | 1 | 3 | … |

3 | 3 | 3 | etc |

Both-Avoid

alicedoor | prizedoor | montydoor | P(alice, prize, monty) |
---|---|---|---|

1 | 1 | 1 | 0 |

1 | 1 | 2 | (1/3)^{2} * (1/2) |

1 | 1 | 3 | (1/3)^{2} * (1/2) |

1 | 2 | 1 | 0 |

1 | 2 | 2 | 0 |

1 | 2 | 3 | (1/3)^{2} |

1 | 3 | 1 | 0 |

1 | 3 | 2 | (1/3)^{2} |

1 | 3 | 3 | 0 |

… | … | … | … |

3 | 3 | 3 | 0 |

All of the rows are equal for random, meaning you don't get any additional information from learning monty's door

You are less likely to land on a row where monty reveals a door and you have the right one (1/3)^{2} * (1/2) than you are to land on a row where monty's hand was forced because there was only one door remaining (1/3)^{2}

Put another way, here are the options if you condition alicedoor = 1 and montydoor = 3

alicedoor | prizedoor | montydoor | P(alice, prize, monty) |
---|---|---|---|

1 | 1 | 3 | (1/3)^{2} * (1/2) |

1 | 2 | 3 | (1/3)^{2} |

Whereas for random, conditioning doesnt get you anything

c)

It doesn't matter for avoidAlice

d)

It doesn't matter for avoidPrize

e)

I can see two good reasons for why this problem feels unintuitive:

- The hypothesis space of rules which select doors overwhelmingly favor it not mattering, and it's possible we just average over the hypothesis space when making intuitive judgments.
- We could just be assuming similar information available to monty and alice; that monty just knows that alice picked door 1 and opened a random door she didn't pick. In this scenario, the fact that monty's door revealed a goat isn't actually information, but rather just coincidence – it could have just as easily been a car.

## Supplementary Readings

The probmods webbook itself has a number of supplementary readings (all of which are excellent and fairly important, so you should do them) but there are some extra readings which are probably useful supplementary material on top of those - a number of them I assume are supposed to be true readings, but the readings pages are often broken so it's hard to say for sure. I took these from the Psych204/CS428 Fall 2018 syllabus, which is a class using this book taught by Noah Goodman (the first author / editor of probmods). I haven't reproduced all of them here, mostly the ones which I don't think are well-covered by the existing readings.

- Concepts in a probabilistic language of thought (alternatively, just Ping Pong in Church)
- Optimal predictions in everyday cognition
- What Do We Actually Know About the Economy? (Wonkish)
- Descriptive vs optimal bayesian modeling (blog post)
- Chapter 1 of The Adaptive Character of Thought (preview pdf has chapter 1, supplements marr's vision)
- Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic (similar to above, deals with resource-rationality which is a more computational ACT-Rish idea)
- A Rational Analysis of Rule-based Concept Learning
- Structure and strength in causal induction