discrete choice modeling blogpost #11

drbenvincent · 2024-10-31T14:46:33Z

This PR adds a notebooks which will form the second Colgate client write-up blog posts.

The first post was Causal sales analytics: Are my sales incremental or cannibalistic?

NOTE: I'll be pretty aggressive about hiding most of the code cells in the final blogpost in order to maximise readability.

Current state: At this point (2024/10/31) I've basically written the first half of the blog post. It outlines the basic discrete choice model and sets up the core limitation of producing uninteresting cannibalization effects.

TODO

We might want to play with the random seed to get the synthetic data nice
We might also want to tweak the synthetic price data to allow for better parameter identiability
Potentially add a manufacturer (or benefit) effect to really show the lack of interesting cannibalization effects.
I'm hoping that either @ricardoV94 or @lucianopaz or @cluhmann will take over the reigns and continue the blog post to talk about the core innovations of what we did. We are allowed to talk about the maths of the nested logit, but we're not allowed to present code to implement it.
Hoping someone can write a nice overview of the cool new stuff that was done. I'll then come back in and wrap it up with the executive summary at the start and a conclusion summary at the end.

review-notebook-app · 2024-10-31T14:46:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ricardoV94 · 2024-11-18T08:41:04Z

@drbenvincent I'm leaving some comments from your part of the blogpost, before I start on the second part:

A single intercept doesn't make sense. I assume you meant B_0_^i (one intercept per item), as you have in the model description below?
In the model description the line u_{i,t} = a = b, dosen't make math sense. You introduce a log of the price, which is clearly not equivalent to the original expression on price. Also the brackets are strange. You are not multiplying the intercept by the price, I assume?
Multinomial model is very confident because total_sales is high. We obviously had more noise in our data, and that's why we had to switch the the DirichletMultinomial model instead, as the Multinomial cannot expalin so large errors with these high total sales. May be worth mentioning?
"So this is all great, but it's the kind of output that data scientists would enjoy." Is there irony in this sentence or missing a "not"?
What-if scenario. Needs a bit more text explaining what results we can see from the 5 plots?
I don't like the plot showing the market share before and after as the distance from the x=y line. This will never show anything interesting unless you remove an item that has a sizeable portion of the market-share (which you would never do anyway). It's also mostly wasted white space on the plot. I would rather show the ratio of market share before or after as a plot-bar, which will clearly show everything going up by the same %. Conversely: imagine you remove an item with a 1% market share and another item takes all this market share (very interesting perfect cannibalization), it would go from x to x+1%, which would still look super boring on the plot you defined. The plot is not good to show what you want.
Prior for intercept should be zerosum, otherwise there's one too many parameters.

Obligatory message: I think overall the blog is in a pretty nice shape!

ricardoV94 · 2024-11-18T10:41:00Z

I'm going to push a second NB that uses pre-generated data according to the NLM. I think this will streamline the blogpost, showing where it fails and why the NLM can address it. Not changing the original NB so we can compare, because git changes suck for NBs

ricardoV94 · 2024-11-27T15:16:38Z

@drbenvincent I pushed an updated nb. I did the following changes:

Replace data generation on the NB by a pre-designed dataset (also uploaded), according to a NLM with two levels (segment -> manufacturer). I tried to use red/blue tones to distinguish the segment. We may want to change it from adults vs kids to something more "competitive" like you had before, if it seems unreasonable that cross-segment cannibalization would be observed. I chose kids for the motivation part for the NLM I wrote. It felt easier with that as an example.
I then define a one-level NLM with segment only, and write a naive code implementation, hard-codde for this case. The idea here is to be just a teaser into the full fledged code that we'll put on pymc-marketing.
The simulated data with two levels can't be fit perfectly with the one level NLM. I didn't want the model to look too good, and this can then be suggested as a next step we would take if we wanted to improve the model.

Other stuff:

ZeroSumNormal on the intercepts
Use standardized log-price to avoid the intercept-slop indeterminacy (sampling is much faster)
Remove the diagonal plot and show the relative change in market share at the last day. This shows it is a single degree of change for the simple model, but two degrees of change for the nested model. Adding a second level would make 3 degrees of change, but we actually don't have enough items on the market by that time to show it, so the plot would look the same.

Questions:

Not sure about the ellipsis plot now that we don't have "true" data to reference. They also don't look pretty, but maybe I messed them up.

TODO:

Write something about the math of the model? How deep do we want to go? Perhaps (perhaps @lucianopaz is the best person for it?)
Cleanup the relative change plot to have items on the xticks
Rerun the NB from top to bottom
Add more text on the fittings/predictions/counterfactuals of the NLM.

To run the NB you will need to comment out compile_kwargs=dict(mode="NUMBA")) from pm.sample. That will only be available in the next release of PyMC. I needed it for the model to actual sample at a reasonable pace on my machine.

initial commit of discrete choice modeling blogpost

40cce0e

ricardoV94 added 2 commits November 18, 2024 18:58

Use data generated in script and bridge to NLM model

8083847

WIP

7b8a0d6

Cleanup NB

129fb60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discrete choice modeling blogpost #11

discrete choice modeling blogpost #11

drbenvincent commented Oct 31, 2024 •

edited

Loading

review-notebook-app bot commented Oct 31, 2024

ricardoV94 commented Nov 18, 2024 •

edited

Loading

ricardoV94 commented Nov 18, 2024

ricardoV94 commented Nov 27, 2024 •

edited

Loading

discrete choice modeling blogpost #11

Are you sure you want to change the base?

discrete choice modeling blogpost #11

Conversation

drbenvincent commented Oct 31, 2024 • edited Loading

review-notebook-app bot commented Oct 31, 2024

ricardoV94 commented Nov 18, 2024 • edited Loading

ricardoV94 commented Nov 18, 2024

ricardoV94 commented Nov 27, 2024 • edited Loading

drbenvincent commented Oct 31, 2024 •

edited

Loading

ricardoV94 commented Nov 18, 2024 •

edited

Loading

ricardoV94 commented Nov 27, 2024 •

edited

Loading