Negative Binomial Distribution | The Science of Data

The negative binomial distribution is a generalization of the geometric distribution. It represents the number of independent Bernoulli trials $k$ until $r$ successes is achieved. It is described by two parameters, the stopping parameter $r$ and the probability of success $p$.

Probability Mass Function

Any sequence of $k$ Bernoulli trials with $r$ successes can occur with probability

\[p^r(1-p)^{k-r}.\]

Because the last trial is always a success, we can distribute the remaining $r-1$ successes to the $k-1$ trials in $\binom{k-1}{r-1}$ ways. Therefore, the PMF for a negative binomal random variable is

\[p_X(k)=\binom{k-1}{r-1}p^r(1-p)^{k-r}\]

where $k=r,r+1,\ldots$

A negative binomial random variable is also the sum of $r$ independent geometric random variables with parameter $p$, i.e.,

\[X=X_1+\ldots+X_r\]

where $X_i\sim\mathrm{Geom}(p).$

Expectation

The expectation of a negative binomial distribution is

\[\mathbb{E}[X]=\frac{r}{p}.\]

Proof:

$$\begin{align} \mathbb{E}[X]&=\mathbb{E}[X_1]+\ldots+\mathbb{E}[X_r]\\ &=\frac{1}{p}+\ldots+\frac{1}{p}\\ &=\frac{r}{p} \end{align}$$

Variance

The variance of a negative binomial distribution is

\[\mathrm{var}(X)=\dfrac{r(1-p)}{p^2}.\]

Proof:

$$\begin{align} \mathrm{var}(X)&=\mathrm{var}(X_1)+\ldots+\mathrm{var}(X_r)\\ &=\frac{1-p}{p^2}+\ldots+\frac{1-p}{p^2}\\ &=\dfrac{r(1-p)}{p^2}. \end{align}$$

Monte Carlo Simulation

library(tidyverse)

reps <- 10000 # number of replications

r <- 6    # number of successes
p <- 0.5  # probability of a success

# Initialize vector for the replications
tosses_to_r_success <- 0

for (i in 1:reps) {
  successes <- 0
  counter <- 0
  while (successes != r) {
    counter <- counter + 1
    toss <- sample(c(0, 1), size = 1, replace = TRUE, prob = c(1-p, p))
    successes <- successes + toss
  }
  tosses_to_r_success[i] <- counter
}

# Plot the distribution
data.frame(tosses_to_r_success) %>%
  ggplot(aes(tosses_to_r_success, y = ..prop..)) +
  geom_bar() +
  labs(x = "Number of Tosses Until r Successes", y = "Probability")