Probability in TFV π Research

1. What the tool actually measures

Pi-Search's Probability tab reports the probability that a given pattern would appear at a random position in a sequence of evenly-distributed random digits. It does this in two ways:

Analytical — closed-form formulas (exact for substring queries, Gaussian approximation for sum queries on the digits of π).
Empirical Monte Carlo — the user's machine generates random digits and counts how many of N trials hit the condition. As N grows, the empirical ratio converges to the analytical estimate.

Both numbers describe the same thing: the probability of the digit-sequence occurring by chance. Neither captures everything that matters about a TFV finding.

2. The relevant constants for sum queries

For evenly-distributed random digits in {0, 1, 2, …, 9}:

Power-of-Aleph (regular sum)

mean of d per digit

μ₁ = 4.5

variance of d per digit

σ₁² = 8.25

expected sum of L digits

4.5 · L

standard deviation of that sum

√(8.25 · L)

Power-of-Bet (squared sum)

mean of d² per digit

μ₂ = 28.5

variance of d² per digit

σ₂² = 721.05

expected squared sum of L digits

28.5 · L

standard deviation of that squared sum

√(721.05 · L)

For an L-digit window to sum to a specific value X under Power-of-Aleph, the ratio X/L must sit near 4.5; under Power-of-Bet, near 28.5. Deviations from those ratios become exponentially less likely as L grows.

Example: the first 165 decimals of π sum to 737 (regular), so X/L ≈ 4.47 — within roughly 0.15 standard deviations of the mean. That alone is not surprising. What makes it striking is that 737 corresponds to "שאמר לעולמו די" (Sh'eamar Le'olamo Dai), and 165 to נקודה (Nekudah, "point") — the meaningful Hebrew terms aligned to a meaningful length give the finding its weight. The next section explains why.

3. The Hebrew constraint — why the real probability is smaller

Pi-Search's reported probability answers: "how often does a uniformly-random sequence of digits sum to X over L positions?"

A TFV finding answers something stricter: "how often does a uniformly-random sequence of digits, with L being the gematria of a meaningful Hebrew term and X being the gematria of another meaningful Hebrew term, exhibit that alignment?"

The Hebrew side adds at least three constraints that the digit side does not:

3a. Not every integer is a gematria of a real Hebrew word

Hebrew letters have specific values (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400). A short Hebrew word can only reach certain sums; a long Hebrew phrase can only reach certain sums. The set of integers that are the gematria of an attested Hebrew word, root, or phrase is sparse compared to all integers.

3b. Letter ordering matters; gematria does not capture it

Two different orderings of the same letters yield the same gematria but only one is the actual word. For a "finding" to be meaningful, the actual word must exist — not merely some anagram. The probability of both the gematria matching and the letter sequence being a real Hebrew term is much smaller than the probability of the gematria matching alone.

3c. Length is also constrained

Even before considering ordering, the length of a Hebrew word constrains what sums it can produce. The number of letters in a TFV-canonical phrase is not a free parameter; it is determined by the phrase. Picking 165 because it equals the gematria of נקודה is not a free move — it is one of a small finite set of meaningful lengths a researcher can investigate.

The implication. When Pi-Search reports "this finding has probability ~1 in N", the actual probability — accounting for the orthogonal Hebrew constraint — is generally smaller than 1/N. In other words: TFV findings are typically more significant than the digit-only number suggests, not less.

The tool reports the digit-only probability because it can compute that exactly. The Hebrew-side probability is mediated by what is or isn't a real Hebrew word at a particular length — a question for the foundational TFV research, not for an automated π-digit analyzer.

4. Independence, and when not to multiply

The naive way to handle composite findings is to multiply the probabilities of the individual conditions. This is correct only when the conditions are statistically independent. They usually are not.

Two conditions that examine overlapping digits share information. Knowing the regular sum of the first 165 digits constrains the squared sum of the same 165 digits.
Two conditions anchored at the same position share the same digits — the most overlap possible.
Composite queries in Pi-Search always anchor at the same position by definition, so the product-of-probabilities is an overestimate of how unlikely the joint event is.

The Monte Carlo path is therefore the better empirical guide for composite probabilities: it observes the actual joint distribution of conditions over random digits, without assuming independence.

5. Sample size and diminishing returns

The empirical ratio reported by the Monte Carlo simulation is itself an estimate, with its own uncertainty. The relative standard error in the estimated probability after N trials, when the true probability is p, is approximately:

relative standard error

≈ √((1 − p) / (N · p))

to halve the error

N must quadruple

Once the empirical ratio settles to two or three significant figures it has converged enough for ordinary TFV research purposes. Beyond that, additional trials yield diminishing returns. The default of 100,000 trials in Pi-Search produces converged estimates for probabilities around 1 in 10⁴; for rarer events, raise the trial count.

For genuinely rare events — probability much less than 1 in N — the analytical Gaussian estimate is usually more informative than any practical Monte Carlo. Pi-Search reports both and the user can choose which one to trust for their specific question.

6. Further reading

Pi-Search · corpus integrity verification — how we confirm the corpus is byte-correct.
Pi-Search · verified canonical findings — click-to-verify list.
TFV foundational π research — the original derivations, including probabilistic discussion.