some background: A few months ago, Nathan presented his free web tool for performing Monte Carlo sims on a position. For the distribution of future price action, he uses a Normal Distribution. During the presentation, Jim Riggio asked about substituting a different distribution: Possibly collecting historic data for the distribution. The response was something, like "it is not easy to do" (I forget, the reasoning, and at the time did not give it adequate thought). so: Now, I would like to revisit the topic, of substituting a real distribution, as a proxy for estimating future price action. The theory: using a real distribution associated with the underlying should produce more accurate results than using a Normal Distribution. (Seems logical) I do not have a strong statistics background, so would like feedback from those with statistical background on likely errors/assumptions in my pursuit. It is my desire to produce a Cumulative Distribution Function from binning Historical returns, that can be applied to Future time values, by indexing by STD DEV, where I merely map the STD DEV to the CDF bin for all bins! My first concern is if errors are introduced by using historic "daily returns" to map to larger time estimates (1->100 Days). -- I should be able to work out the CDF (will probably merely use the BIN probabilities) -- I do not yet anticipate difficulty here. Using Excel's "Descriptive Statistics" Analysis tool for examining both a daily % Return and a weekly % Return on data from 1986 to present, produces the following results: DailyMean0.037449Standard Error0.01275Median0.057263Mode0Standard Deviation1.139284Sample Variance1.297968Kurtosis21.35565Skewness-0.8324Range32.04696Minimum-20.4669Maximum11.58004Sum298.9918Count7984Largest(1)11.58004Smallest(1)-20.4669Confidence Level(95.0%)0.024994weeklyMean0.144179Standard Error0.062901Median0.291831Mode#N/AStandard Deviation2.290484Sample Variance5.246317Kurtosis5.409974Skewness-0.72341Range28.90254Minimum-18.1955Maximum10.70707Sum191.1809Count1326Largest(1)10.70707Smallest(1)-18.1955Confidence Level(95.0%)0.123396~ Note: The Weekly is sloppily created by only capturing each Friday close, vs the prior Friday close, therefore occurrences of non-trading Fridays, will result in one less sample, but the period of the next will cover 2 weeks instead of one. I don't think this sloppy sampling introduces significant error. All data is % return (sample-sample[-1])/sample[-1])% (where "-1" means prior sample) Can someone with a good grasp on statistics confirm that the above Daily and Weekly summaries do NOT suggest a flaw in using Daily return data to produce a CDF that will be used to infer different ranges of future time deltas? BTW: This is for SPX only! Thanks in advance. PS: After this, other questions will be pursued, such as relevance of some historic periods, etc.
Another perspective: Looking at these same Daily returns, but limiting to Jan 2012 to present, results in the following summary from the Excel function: (With the volatility reduction in recent years, makes it less clear there is a lot of value in abandoning Normal Distribution for Historic) Daily Returns 5yrMean0.049319Standard Error0.020515Median0.038178Mode0Standard Deviation0.774689Sample Variance0.600144Kurtosis2.265039Skewness-0.26712Range7.844751Minimum-3.94137Maximum3.903385Sum70.32824Count1426Largest(1)3.903385Smallest(1)-3.94137Confidence Level(95.0%)0.040243
I disagree it has been normal distribution since 2012. Here is the return distribution for SPY since 2012.
Srini: I'm confused with your comment "I disagree"! Can you clarify what you disagree with? -- The 2012 daily returns for SPX, and your presentation of SPY are basically the same. My last post infers that if we base future returns on the 2012 to present returns, then it is less clear there is a lot of value in abandoning Normal Distribution for Historic. Perhaps I have missed a point. BTW: Thnks for the post. Is this from some free software/web tool?
Oh Ok. I missed "less" part in your sentence. Sorry. That chart is from Thomson Reuters software (not free). Edit: This link might be of further interest to you. https://sixfigureinvesting.com/2016...-with-laplace-distribution-instead-of-normal/
No worries. It is difficult to determine which of these "greased pigs" are worth catching. I think there is value in altering the distribution, but unclear how much, as well as potential dangers of the future periods of interest not being representative of the historic periods we select for the our basis.
I had a brief exchange with Nathan back in July where I asked him the same question (about using real data rather than a Normal distribution). I'm not entirely convinced real data is of any benefit. I consider what happens in the market to be one possibility of many, anyway. I do tend to think losers stock together [on the rare occasion] when the market trends, though. Maybe if you did get a big downmove you could increase the probability of another down move rather than randomly sampling to get the next data point. Here's another question: if you used real data then would you sample with replacement or without?
Mark17: Nathan presents a tool, that is generic, and can be applied across a number of instruments. So, for him to consider historic data seems to be a non-starter (how much additional information would the user need to feed the tool to specify the additional details...) I don't know if he could have done a better job with the ease of use provided. My situation is much easier... I am only interested in SPX. I have access to all the data I might have interest in (everything is no-touch, after I get it coded). I am thinking about merely picking a historic interval, that I "think" may approximate future returns ranges. -- Trying to determine if some date ranges should be excluded, currently. I think if I do it correctly, the results will only change very slightly, but my hope is this slight change will be an improvement. It is my belief these changes will improve my Expectancy calculations. --- As I make further progress, I should know if I have a need to "massage the historic data any". I don't think so! -- Regarding your middle statement relating to "you could increase the probability of another down move rather than randomly sampling to get the next data point" -- I have thought about this only briefly, however, this seems to take the effort on a different path. -- perhaps that can be rethought at some point in the future, after I depart from a Normal Distribution expected price model. (baby steps) There seems to be some value in considering applying more significance to more recent data, however, I am not yet comfortable with the deep discounting of history for this. I don't follow your question: " if you used real data then would you sample with replacement or without? " -- The "real data" is to be used to produce probability bins indexed by %move, or fractional STDEV moves or similar.
I think Mark brings up a great point about observed returns - - even if you sample randomly from the historical distribution, with or without replacement, there's an implicit assumption that you're sampling from the "whole" distribution. It should be intuitive why this can't be the case, simply for the fact that there's not enough resolution in the available set of historical returns. Additionally, I don't even believe a "whole" distribution exists - - even if we could "see" it, I suspect it's shape would be fluid, similar to the 30-day implied distribution in the SPX, for example. That said, it may still be worthwhile to run Monte Carlo with historic dists, if only to have a baseline comparison to the sims that use Normal dists. I'd be wary of trusting any expectancy calcs though - seems like it'd be easy to overfit.
Ice: Why would one "sample randomly from the historical distribution..." intentionally? Seems ludicrous, as you are taking perfectly good information and smearing it with fecal mater! (Pardon my French). While I am not a statistical expert, every case (only about 4 in the past 30 years) I have tried using Monte Carlo simulation to attempt solving for a better insight, has resulted in the discovery that Monte Carlo is a crutch, when one does NOT know what they are looking for. In each case I have encountered, a direct path to a solution existed, without the added error introduced by throwing in the random features of Monte Carlo. I am NOT stating that Monte Carlo is not useful, but that it has never been the appropriate tool for the cases I have encountered. I still do not get the reference (also mentioned by Mark) about "with or without replacement" --- Pardon, but I'd like to know what this reference implies. Update: MonteCarlo simulation is fine for "ball-park" estimates, if you do not require precise solutions. (For me, ball park may be good initially, but not good enough after studying the subject)
Sampling from the historical distribution randomly allows you to use "real" returns in your simulations without the "ordering" problem - i.e., the returns are not necessarily plucked from the distribution in the same sequence that they actually occurred. This simply refers to whether or not you allow a return to be used more than once in your simulation if it has already appeared. Neither am I
Interesting topic. My thoughts before dealing with this problem in practice (I'm not there yet). Firstly, seconding Garry's line, are we sure that changing distribution will help. If so, if effort is worth results? This is similar to dilemma with volSkew – knowledge of historic behavior doesn't guarantee accurate predictions. It doesn't ofc mean that historic knowledge is useless, opposite, hist. knowledge is required before meddling with models. Secondly: how easy it is to achieve. If we change distr type we should do it in BS model as well. To my understanding BS uses standard normal distr. Is it worth efforts to changing it to skewed distribution? If so in what market conditions we should do it and when use another type of distr.? If for some reason we decide to use hist. distr. calculations became much more complicated. I guess it all depends on our 'why'. Why do we want to change distribution type? If it is for better future prediction –> see all that jazz with volSkew. If we want to test strategies we should – and this is my approach – use purest data possible (=artificial), then test results on market data. In that case distribution type doesn't seem to be so important (with many caveats. for example we have to be completely aware of fat tails issue, but give it up for easiness of modeling). To my understanding we may be as well dealing with distributions with sums different from 1. Sorry for little chaotic post. As I said before I don't have grasp of distribution, for example my view on relationship Skew-Distribution is mostly intuitive. I will try to be quiet as I don't have much to share leaving room for others. Gary, thanks for starting this tread. Srini, thanks for link, it is bookmarked.
Ice: Interesting: Why are you using the terminology "in your simulation..." I think we are not in the same ball-park. Perhaps I have failed to be clear (happens more frequently than not, sad to say). --- Nevertheless, just bounding stuff around here is actually helping me to work thru the mine fields. The notion of "replacement" is not relevant to anything I am doing, or considering. Marcas: Seems you are reading my mind! Some background: I am using some Expectancy calcs, as well as related metrics in the RTT reporting service, and am extending this set of metrics to all my personal position reporting, as I think this is a possibly worthwhile addition to my cockpit of metics/indicators. Observing these metrics day after day, one begins to consider them in more depth. For example, I am fully aware of the short cuts taken in the derivation of the metrics. I am considering improving the quality of the metrics, by replacing the known-errant assumptions, with better substitutes. One of these is the assumption that future prices will follow a Normal Distribution (there are 3 biggies, to address, and this is likely the smallest contributor of error, but is the first in the sequence that should be improved). Your reference to altering the BSM for the new distribution: I think NOT!. However, I may do some testing to quantify the amount of error remaining! -- I think it is something to consider, but unclear if it will be necessary. -- Adding the new dist to BSM will likely severely impact my compute time, as I churn through lots of BSM iterations. [(Supporting statement for this "theory"): I will use the new distribution for underlying price estimation, that would subsequently be presented to the BSM, so the error of the BSM using a Normal VS skewed distribution is expected to be small] My current thinking, is along these lines.... (this has been helping me wade through complexity very well recently)... Isolate the unknown from the known, and resolve the known in proper ordered sequence. We still have much known mixed with the unknowns, preventing us from making progress -- the presence of the unknowns results in tail-chasing -- NO Progress. Stare at some of this stuff long enough and more of the knowns come into focus. -- --------------- BTW: vol skew is queued in my to-do list, but after a few items that have sequence priority.
I'm probably the one who's mixed up here --- I thought the discussion was centered on using historical distributions with Monte Carlo simulations -- the 'OptionsRiskCalc' currently uses Monte Carlo which assumes a Normal dist if I'm not mistaken.
That's a good point, Gary - if you change the underlying process for Monte Carlo sims, the option pricing model would also need to be updated since both rely on the same process for the underlying price path.
Ah... Beginning to synchronize. You are correct! ... Nathan's tool was initial seed for much of this! It is my opinion, that Nathan used Monte Carlo sims for his OptionsRiskCalc, more of a "well you can use Monte Carlo sims to produce this info", when he could have more easily, NOT used Monte Carlo at all, and directly produced better info (at least for my needs). -- The metrics from Nathan's tool can be derived directly without any randomization! This avoids any notion of sample size (he does 100K samples, if I recall, but each run can produce results that vary quite a bit, however none of that randomness is necessary OR desirable.) -- I first implemented a Monte Carlo solution, then examining, began to realize the Monte Carlo sim merely took perfectly clean "known" data, and injected "unknowns" needlessly in the results. -- I went thru this in detail in my Round Table presentation of the RTT Entry Tool -- Take the product of the Distribution and the PnL graph, sort and you are done! (no simulations needed)
I got it. www.youtube.com/watch?v=wqrGKLnbPow and Gary can you give us timestamp where you touch this subject.
A bit over 22 minutes into the presentation, I move to the slide titled "Initial Position probability based examination". Slides 9 and 10 of the presentation detail the replacement algorithm. Slides 15 through 19 of the Supplemental PDF has the supplemental details. The PDF's are posted with the Round Table presentation on CD, and may make it easier to examine the details. "30 Aug 17 Round Table" Let me know if something is not clear.