Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiscreteDistribution point masses to be a private attribute #623

Closed
sbenthall opened this issue Apr 14, 2020 · 8 comments
Closed

DiscreteDistribution point masses to be a private attribute #623

sbenthall opened this issue Apr 14, 2020 · 8 comments
Assignees
Milestone

Comments

@sbenthall
Copy link
Contributor

X is not a good attribute name.

Better to use something more reflective of the mathematical term

@sbenthall
Copy link
Contributor Author

This field is currently named X because that was the parameter name in the old function from which this code was derived.

I think improving this may involve untangling and revising more of the HARK code.

For comparison, it looks like neither Mathematica nor PyMC3 expose the point values of a discrete distribution. It's possible that with better designed code, this is not so necessary:
https://reference.wolfram.com/language/ref/DiscreteUniformDistribution.html
https://docs.pymc.io/api/distributions/discrete.html

If I'm reading things right, in the original HARK design, combineIndepDstns was used to bundle the transitory and permanent income shocks into one list-of-arrays--for convenience, I guess.

The design here is a bit baffling because in ConsIndShockModel, which I suppose is paradigmatic, the IncomeDstn contains all the information about the transitory and permanent income shocks; but these are independent of each other. But the method also returns the PermShkDstn and TransShkDstn separately. From the looks of it, this is entirely redundant while making the code more complex.

I'd love to simplify things down into a cleaner design and given the comments on #538 it seems like this is an area that hasn't been given much thought since it was originally written and a refactor is welcome, in principle.

It doesn't look like it's possible to do this cleanly with a dedicated, generic distribution object like what in Mathematica is a LogMultiNormal distribution:
https://reference.wolfram.com/language/ref/LogMultinormalDistribution.html

This would be inappropriate for two reasons:

  • the point of a LogMultinormal distribution is to deal with correlations between the multivariate outputs; here the distributions are independent by construction
  • the underlying shock distributions are not truly LogNormal because they have an additional point value added for unemployment

Later, when the shock distributions are actually sampled, it looks like the IncomeDstn is used, so it's not clear why the PermShkDstn and TransShkDstn objects were created and stored separately earlier...

IncomeDstnNow = self.IncomeDstn[t-1] # set current income distribution
PermGroFacNow = self.PermGroFac[t-1] # and permanent growth factor
# Get random draws of income shocks from the discrete distribution
EventDraws = IncomeDstnNow.draw_events(N,
seed=self.RNG.randint(0,2**31-1))
PermShkNow[these] = IncomeDstnNow.X[0][EventDraws]*PermGroFacNow # permanent "shock" includes expected growth
TranShkNow[these] = IncomeDstnNow.X[1][EventDraws]

Long story short: the X attribute here seems mainly to be an artifact of a bad code style. There's no reason for IncomeDstn to be stored as a multivariate distribution because its component distributions are independent. We can simplify the Distribution code by restricting the interface to creating the distribution and sampling from it.

This change will ripple through the codebase and to downstream code, but seems desirable.

@sbenthall
Copy link
Contributor Author

Notes from today's meeting:

The idea is to have flexibility for a multivariate distribution where the variables are not independent.
This is used elsewhere in HARK.

Ideally, there should be a multivariate distribution object from which it is possible to extract the marginal distributions, with various options for methods of discretization and approximation.

@sbenthall
Copy link
Contributor Author

#730

@sbenthall
Copy link
Contributor Author

Reopening this. Dolo IID Process integration is for a later milestone now #611
This can be solved easily by the 1.0 release, and is implicated directly in the now manageable solution code simplications #625

@sbenthall sbenthall reopened this Jan 13, 2021
@sbenthall sbenthall self-assigned this Jan 13, 2021
@sbenthall
Copy link
Contributor Author

This issue is awkwardly named; I see now that this simple "name change" implicates several other issues, including #120 and #625.

It may require further review to determine the right short term solution.

One may be: through calcExpectations (#625), there should be no reason for the point masses to be a public variable.

@sbenthall sbenthall changed the title DiscreteDistribution X is not a good name for the attribute DiscreteDistribution point masses to be a private attribute Jan 19, 2021
@llorracc
Copy link
Collaborator

The way to think about this is to think about the discretized distributions generically as approximations to idealized continuous distributions. So, for example, we might want to say that the setup is one in which the "truth" is that the permanent shocks to idiosyncratic permanent income and to aggregate stock returns are jointly lognormally distributed with a covariance coefficient of 0.1. But in practice any numerical solution will have to approximate that with some particular discretization of the relationship, which is a matrix. Among competing discretizations, we want to have an unambiguous standard of "which one is closer to being right" -- in case the models differ in some important prediction. We want the unambiguous answer to be "whichever one is closer to the answer you get as you increase the fidelity of the discrete approximation to the continuous assumption." A 1000x1000 approximation is going to give a closer approximation than a 3x3 approximation. So whichever discretization gives an answer closer to the more accurate approximation is right.

So, it is not worthwhile to devote any effort to trying to efficiently meld the unemployment shock with the transitory shock. The interesting questions will all have to do with either (a) correlations between the transitory and permanent shocks; or (b) correlations between the unemployment shock and either transitory or permanent shocks (conditional on not becoming unemployed).

PS. I'd really, really like to get into HARK 1.0 the default of our distributions being ones with point masses at the min and max values of some standard deviation, per the discussion elsewhere. Matt and I strongly agree on this subject, for deep and powerful reasons. We both wish I'd done it a long time ago (like, 20 years ago). It's clearly the right way to do things.

@sbenthall
Copy link
Contributor Author

PS. I'd really, really like to get into HARK 1.0 the default of our distributions being ones with point masses at the min and max values of some standard deviation, per the discussion elsewhere. Matt and I strongly agree on this subject, for deep and powerful reasons. We both wish I'd done it a long time ago (like, 20 years ago). It's clearly the right way to do things.

Is there an existing issue for this? If not, could you make one describing what you mean?

@sbenthall
Copy link
Contributor Author

The way to think about this is to think about the discretized distributions generically as approximations to idealized continuous distributions.

I suppose this will be fixed as part of #640 then.

Looks like this ticket can be closed as its scope is covered in better detail in other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants