## Fundamentals of Probability

Probability is the branch of mathematics that deals with the likelihood that certain outcomes will occur. There are five basic rules, or axioms, that one must understand while studying the fundamentals of probability.

### Learning Objectives

Explain the most basic and most important rules in determining the probability of an event

### Key Takeaways

#### KEY POINTS

- Probability is a number that can be assigned to outcomes and events. It always is greater than or equal to zero, and less than or equal to one.
- The sum of the probabilities of all outcomes must equal [latex]1[/latex].
- If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities.
- The probability that an event does not occur is [latex]1[/latex] minus the probability that the event does occur.
- Two events [latex]text{A}[/latex] and [latex]text{B}[/latex] are independent if knowing that one occurs does not change the probability that the other occurs.

#### KEY TERMS

**experiment**: Something that is done that produces measurable results, called outcomes.**outcome**: One of the individual results that can occur in an experiment.**event**: A subset of the sample space.**sample space**: The set of all outcomes of an experiment.

In discrete probability, we assume a well-defined experiment, such as flipping a coin or rolling a die. Each individual result which could occur is called an outcome. The set of all outcomes is called the sample space, and any subset of the sample space is called an event.

For example, consider the experiment of flipping a coin two times. There are four individual outcomes, namely [latex]text{HH},text{HT},text{TH},text{TT}[/latex]. The sample space is thus [latex]{text{HH},text{HT},text{TH},text{TT}}[/latex]. The event “at least one heads occurs” would be the set [latex]{text{HH},text{HT},text{TH}}[/latex]. If the coin were a normal coin, we would assign the probability of 1/4 to each outcome.

In probability theory, the probability [latex]text{P}[/latex] of some event [latex]text{E}[/latex], denoted [latex]text{P}left(text{E}right)[/latex], is usually defined in such a way that [latex]text{P}[/latex] satisfies a number of axioms, or rules. The most basic and most important rules are listed below.

### Probability Rules

Probability is a number. It is always greater than or equal to zero, and less than or equal to one. This can be written as [latex]0leq{text{P}}left(text{A}right)geq{1}[/latex]. An impossible event, or an event that never occurs, has a probability of [latex]0[/latex]. An event that always occurs has a probability of [latex]1[/latex]. An event with a probability of [latex]0.5[/latex] will occur half of the time.

The sum of the probabilities of all possibilities must equal [latex]1[/latex]. Some outcome must occur on every trial, and the sum of all probabilities is 100%, or in this case, [latex]1[/latex]. This can be written as [latex]text{P}left(text{S}right)=1[/latex], where [latex]text{S}[/latex] represents the entire sample space.

If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities. If one event occurs in 30% of the trials, a different event occurs in 20% of the trials, and the two cannot occur together (if they are disjoint), then the probability that one or the other occurs is 30%+20%=50%. This is sometimes referred to as the addition rule, and can be simplified with the following: [latex]text{P}left({text{A}} text{ or} {text{ B}}right)=text{P}left(text{A}right)+text{P}left(text{B}right)[/latex]. The word “or” means the same thing in mathematics as the union, which uses the following symbol: [latex]cup[/latex]. Thus when [latex]text{A}[/latex] and [latex]text{B}[/latex] are disjoint, we have [latex]text{P}left(text{A}cup{text{B}}right)=text{P}left(text{A}right)+text{P}left(text{B}right)[/latex]. The probability that an event does not occur is [latex]1[/latex] minus the probability that the event does occur. If an event occurs in 60% of all trials, it fails to occur in the other 40%, because 100%−60%=40%. The probability that an event occurs and the probability that it does not occur always add up to 100%, or [latex]1[/latex]. These events are called complementary events, and this rule is sometimes called the complement rule. It can be simplified with [latex]text{P}left(text{A}^text{c}right)=1−text{P}left(text{A}right)[/latex], where [latex]text{A}^text{c}[/latex] is the complement of [latex]text{A}[/latex].

Two events [latex]text{A}[/latex] and [latex]text{B}[/latex] are independent if knowing that one occurs does not change the probability that the other occurs. This is often called the multiplication rule. If [latex]text{A}[/latex] and [latex]text{B}[/latex] are independent, then [latex]text{P}left(text{A} text{ and} text{ B}right)=text{P}left(text{A}right)text{P}left(text{B}right)[/latex]. The word “and” in mathematics means the same thing in mathematics as the intersection, which uses the following symbol: [latex]cap[/latex]. Therefore when [latex]text{A}[/latex] and [latex]text{B}[/latex] are independent, we have [latex]text{P}left(text{A}cap{text{B}}right)=text{P}left(text{A}right)text{P}left(text{B}right)[/latex].

#### Extension of the Example

Elaborating on our example above of flipping two coins, assign the probability [latex]1/4[/latex] to each of the [latex]4[/latex] outcomes. We consider each of the five rules above in the context of this example.

1. Note that each probability is [latex]1/4[/latex], which is between [latex]0[/latex] and [latex]1[/latex].

2. Note that the sum of all the probabilities is [latex]1[/latex], since [latex]frac{1}{4}+frac{1}{4}+frac{1}{4}+frac{1}{4}=1[/latex].

3. Suppose [latex]text{A}[/latex] is the event exactly one head occurs, and B is the event exactly two tails occur. Then [latex]text{A}={text{HT},text{TH}}[/latex] and [latex]text{B}={text{TT}}[/latex] are disjoint. Also, [latex]text{P}left(text{A}cup{text{B}}right)=frac{3}{4}=frac{2}{4}+frac{1}{4}=text{P}left(text{A}right)+text{P}left(text{B}right)[/latex].

4. The probability that no heads occurs is [latex]1/4[/latex], which is equal to [latex]1−3/4[/latex]. So if [latex]text{A}={text{HT},text{TH},text{HH}}[/latex] is the event that a head occurs, we have [latex]text{P}left(text{A}^text{c}right)=frac{1}{4}=1−frac{3}{4}=1−text{P}left(text{A}right)[/latex].

5. If [latex]text{A}[/latex] is the event that the first flip is a heads and [latex]text{B}[/latex] is the event that the second flip is a heads, then [latex]text{A}[/latex] and[latex]text{B}[/latex] are independent. We have [latex]text{A}={text{HT},text{HH}}[/latex] and [latex]text{B}={text{TH},text{HH}}[/latex] and [latex]text{A}cap{text{B}}={text{HH}}[/latex]. Note that [latex]text{P}left(text{A}cap{text{B}}right)=frac{1}{4}=frac{1}{2}cdot{frac{1}{2}}=text{P}left(text{A}right)text{P}left(text{B}right)[/latex].

## Conditional Probability

The conditional probability of an event is the probability that an event will occur given that another event has occurred.

### Learning Objectives

Explain the significance of Bayes’ theorem in manipulating conditional probabilities

### Key Takeaways

#### KEY POINTS

- The conditional probability [latex]text{P}left(text{B}mid{text{A}}right)[/latex] of an event [latex]text{B}[/latex], given an event [latex]text{A}[/latex], is defined by: [latex]text{P}left(text{B}mid{text{A}}right)=frac{text{P}left(text{A}cap{text{B}}right)}{text{P}left(text{A}right)}[/latex], when [latex]text{P}left(text{A}right)>0[/latex].
- If the knowledge that event [latex]text{A}[/latex] occurs does not change the probability that event [latex]text{B}[/latex] occurs, then [latex]text{A}[/latex] and[latex]text{B}[/latex] are independent events, and thus, [latex]text{P}left(text{B}mid{text{A}}right)=text{P}left(text{B}right)[/latex].
- Mathematically, Bayes’ theorem gives the relationship between the probabilities of [latex]text{A}[/latex] and [latex]text{B}, text{P}left(text{A}right)[/latex] and [latex]text{P}left(text{B}right)[/latex], and the conditional probabilities of [latex]text{A}[/latex] given [latex]text{B}[/latex] and [latex]text{B}[/latex] given [latex]text{A}, text{P}left(text{A}cap{text{B}}right)[/latex] and [latex]text{P}left(text{B}cap{text{A}}right)[/latex]. In its most common form, it is: [latex]text{P}left(text{A}cap{text{B}}right)=frac{text{P}left(text{B}mid{text{A}}right)text{P}left(text{A}right)}{text{P}left(text{B}right)}[/latex].

#### KEY TERMS

**conditional probability**: The probability that an event will take place given the restrictive assumption that another event has taken place, or that a combination of other events has taken place**independent**: Not dependent; not contingent or depending on something else; free.

### Probability of B Given That A Has Occurred

Our estimation of the likelihood of an event can change if we know that some other event has occurred. For example, the probability that a rolled die shows a [latex]2[/latex] is [latex]1/6[/latex] without any other information, but if someone looks at the die and tells you that is is an even number, the probability is now [latex]1/3[/latex] that it is a [latex]2[/latex]. The notation [latex]text{P}left(text{B}mid{text{A}}right)[/latex] indicates a conditional probability, meaning it indicates the probability of one event under the condition that we know another event has happened. The bar “[latex]mid[/latex]” can be read as “given”, so that [latex]text{P}left(text{B}mid{text{A}}right)[/latex] is read as “the probability of [latex]text{B}[/latex] given that [latex]text{A}[/latex] has occurred”.

The conditional probability [latex]text{P}left(text{B}mid{text{A}}right)[/latex] of an event [latex]text{B}[/latex], given an event [latex]text{A}[/latex], is defined by:

[latex]text{P}left(text{B}mid{text{A}}right)=frac{text{P}left(text{A}cap{text{B}}right)}{text{P}left(text{A}right)}[/latex]

When [latex]text{P}left(text{A}right)>0[/latex]. Be sure to remember the distinct roles of [latex]text{B}[/latex] and [latex]text{A}[/latex] in this formula. The set after the bar is the one we are assuming has occurred, and its probability occurs in the denominator of the formula.

### Example

Suppose that a coin is flipped 3 times giving the sample space:

[latex]text{S}={text{HHH},text{HHT},text{HTH},text{THH},text{TTH},text{THT},text{HTT},text{TTT}}[/latex]

Each individual outcome has probability [latex]1/8[/latex]. Suppose that [latex]text{B}[/latex] is the event that at least one heads occurs and [latex]text{A}[/latex] is the event that all 3 coins are the same. Then the probability of [latex]text{B}[/latex] given [latex]text{A}[/latex] is [latex]1/2[/latex], since [latex]text{A}cap{text{B}}={text{HHH}}[/latex] which has probability [latex]1/8[/latex] and [latex]text{A}={text{HHH},text{TTT}}[/latex] which has probability [latex]2/8[/latex], and [latex]frac{1/8}{2/8}=frac{1}{2}[/latex].

### Independence

The conditional probability [latex]text{P}left(text{B}mid{text{A}}right)[/latex] is not always equal to the unconditional probability [latex]text{P}left(text{B}right)[/latex]. The reason behind this is that the occurrence of event [latex]text{A}[/latex] may provide extra information that can change the probability that event [latex]text{B}[/latex] occurs. If the knowledge that event [latex]text{A}[/latex] occurs does not change the probability that event [latex]text{B}[/latex] occurs, then [latex]text{A}[/latex] and [latex]text{B}[/latex] are independent events, and thus, [latex]text{P}left(text{B}mid{text{A}}right)=text{P}left(text{B}right)[/latex].

### Bayes’ Theorem

In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule) is a result that is of importance in the mathematical manipulation of conditional probabilities. It can be derived from the basic axioms of probability.

Mathematically, Bayes’ theorem gives the relationship between the probabilities of [latex]text{A}[/latex] and [latex]text{B}[/latex], [latex]text{P}left(text{A}right)[/latex] and [latex]text{P}left(text{B}right)[/latex], and the conditional probabilities of [latex]text{A}[/latex] given [latex]text{B}[/latex] and [latex]text{B}[/latex] given [latex]text{A}[/latex]. In its most common form, it is:

[latex]text{P}left(text{A}mid{text{B}}right)=frac{text{P}left(text{B}mid{text{A}}right)text{P}left(text{A}right)}{text{P}left(text{B}right)}[/latex]

This may be easier to remember in this alternate symmetric form:

[latex]frac{text{P}left(text{A}mid{text{B}}right)}{text{P}left(text{B}mid{text{A}}right)}=frac{text{P}left(text{A}right)}{text{P}left(text{B}right)}[/latex]

### Example

Suppose someone told you they had a nice conversation with someone on the train. Not knowing anything else about this conversation, the probability that they were speaking to a woman is 50%. Now suppose they also told you that this person had long hair. It is now more likely they were speaking to a woman, since women in in this city are more likely to have long hair than men. Bayes’s theorem can be used to calculate the probability that the person is a woman.

To see how this is done, let [latex]text{W}[/latex] represent the event that the conversation was held with a woman, and [latex]text{L}[/latex] denote the event that the conversation was held with a long-haired person. It can be assumed that women constitute half the population for this example. So, not knowing anything else, the probability that [latex]text{W}[/latex] occurs is [latex]text{P}left(text{W}right)=0.5[/latex].

Suppose it is also known that 75% of women in this city have long hair, which we denote as [latex]text{P}left(text{L}mid{text{W}}right)=0.75[/latex]. Likewise, suppose it is known that 25% of men in this city have long hair, or [latex]text{P}left(text{L}mid{text{M}}right)=0.25[/latex], where [latex]text{M}[/latex] is the complementary event of [latex]text{W}[/latex], i.e., the event that the conversation was held with a man (assuming that every human is either a man or a woman).

Our goal is to calculate the probability that the conversation was held with a woman, given the fact that the person had long hair, or, in our notation, [latex]text{P}(text{W}mid{text{L}})[/latex]. Using the formula for Bayes’s theorem, we have:

[latex]text{P}left(text{W}mid{text{L}}right)=frac{text{P}left(text{L}mid{text{W}}right)text{P}left(text{W}right)}{text{P}left(text{L}right)}=frac{text{P}left(text{L}mid{text{W}}right)text{P}left(text{W}right)}{text{P}left(text{L}mid{text{W}}right)text{P}left(text{W}right)+text{P}left(text{L}mid{text{M}}right)text{P}left(text{M}right)}=frac{0.75cdot{0.5}}{0.75cdot{0.5}+0.25cdot{0.5}}=0.75[/latex]

## Unions and Intersections

Union and intersection are two key concepts in set theory and probability.

### Learning Objectives

Give examples of the intersection and the union of two or more sets

### Key Takeaways

#### KEY POINTS

- The union of two or more sets is the set that contains all the elements of the two or more sets. Union is denoted by the symbol [latex]cup[/latex].
- The general probability addition rule for the union of two events states that [latex]text{P}left(text{A}cup{text{B}}right)=text{P}left(text{A}right)+text{P}left(text{B}right)−text{P}left(text{A}cap{text{B}}right)[/latex], where [latex]text{A}cap{text{B}}[/latex] is the intersection of the two sets.
- The addition rule can be shortened if the sets are disjoint: [latex]text{P}left(text{A}cup{text{B}}right)=text{P}left(text{A}right)+text{P}left(text{B}right)[/latex]. This can even be extended to more sets if they are all disjoint: [latex]text{P}left(text{A}cup{text{B}}cup{text{C}}right)=text{P}left(text{A}right)+text{P}left(text{B}right)+text{P}left(text{C}right)[/latex].
- The intersection of two or more sets is the set of elements that are common to every set. The symbol [latex]cap[/latex] is used to denote the intersection.
- When events are independent, we can use the multiplication rule for independent events, which states that [latex]text{P}left(text{A}cap{text{B}}right)=text{P}left(text{A}right)text{P}left(text{B}right)[/latex].

#### KEY TERMS

**independent**: Not contingent or dependent on something else.**disjoint**: Having no members in common; having an intersection equal to the empty set.

Probability uses the mathematical ideas of sets, as we have seen in the definition of both the sample space of an experiment and in the definition of an event. In order to perform basic probability calculations, we need to review the ideas from set theory related to the set operations of union, intersection, and complement.

### Union

The union of two or more sets is the set that contains all the elements of each of the sets; an element is in the union if it belongs to at least one of the sets. The symbol for union is [latex]cup[/latex], and is associated with the word “or”, because [latex]text{A}cup{text{B}}[/latex] is the set of all elements that are in [latex]text{A}[/latex] or [latex]text{B}[/latex] (or both.) To find the union of two sets, list the elements that are in either (or both) sets. In terms of a Venn Diagram, the union of sets [latex]text{A}[/latex] and [latex]text{B}[/latex] can be shown as two completely shaded interlocking circles.

In symbols, since the union of [latex]text{A}[/latex] and [latex]text{B}[/latex] contains all the points that are in [latex]text{A}[/latex] or [latex]text{B}[/latex] or both, the definition of the union is:

[latex]text{A}cup{text{B}}={text{x}:text{x}in{text{A}} text{ or } text{x}in{text{B}}}[/latex]

For example, if [latex]text{A}={1,3,5,7}[/latex] and [latex]text{B}={1,2,4,6}[/latex] , then [latex]text{A}cup{text{B}}={1,2,3,4,5,6,7}[/latex]. Notice that the element 1 is not listed twice in the union, even though it appears in both sets [latex]text{A}[/latex] and [latex]text{B}[/latex]. This leads us to the general addition rule for the union of two events:

[latex]text{P}left(text{A}cup{text{B}}right)=text{P}left(text{A}right)+text{P}left(text{B}right)−text{P}left(text{A}cap{text{B}}right)[/latex]

Where [latex]text{P}left(text{A}cap{text{B}}right)[/latex] is the intersection of the two sets. We must subtract this out to avoid double counting of the inclusion of an element.

If sets [latex]text{A}[/latex] and [latex]text{B}[/latex] are disjoint, however, the event [latex]text{A}cap{text{B}}[/latex] has no outcomes in it, and is an empty set denoted as ∅, which has a probability of zero. So, the above rule can be shortened for disjoint sets only:

[latex]text{P}left(text{A}cup{text{B}}right)=text{P}left(text{A}right)+text{P}left(text{B}right)[/latex]

This can even be extended to more sets if they are all disjoint:

[latex]text{P}left(text{A}cup{text{B}}cup{text{C}}right)=text{P}left(text{A}right)+text{P}left(text{B}right)+text{P}left(text{C}right)[/latex]

### Intersection

The intersection of two or more sets is the set of elements that are common to each of the sets. An element is in the intersection if it belongs to all of the sets. The symbol for intersection is [latex]cap[/latex], and is associated with the word “and”, because [latex]text{A}cap{text{B}}[/latex] is the set of elements that are in [latex]text{A}[/latex] and [latex]text{B}[/latex] simultaneously. To find the intersection of two (or more) sets, include only those elements that are listed in both (or all) of the sets. In terms of a Venn Diagram, the intersection of two sets [latex]text{A}[/latex] and [latex]text{B}[/latex] can be shown at the shaded region in the middle of two interlocking circles .

In mathematical notation, the intersection of [latex]text{A}[/latex] and [latex]text{B}[/latex] is written as[latex]text{A}cap{text{B}}={text{x}:text{x}in{text{A}}[/latex] and [latex]text{x}in{text{B}}}[/latex]. For example, if [latex]text{A}={1,3,5,7}[/latex] and [latex]text{B}={1,2,4,6}[/latex], then [latex]text{A}cap{text{B}}={1}[/latex] because [latex]1[/latex] is the only element that appears in both sets [latex]text{A}[/latex] and [latex]text{B}[/latex].

When events are independent, meaning that the outcome of one event doesn’t affect the outcome of another event, we can use the multiplication rule for independent events, which states:

[latex]text{P}left(text{A}cap{text{B}}right)=text{P}left(text{A}right)text{P}left(text{B}right)[/latex]

For example, let’s say we were tossing a coin twice, and we want to know the probability of tossing two heads. Since the first toss doesn’t affect the second toss, the events are independent. Say is the event that the first toss is a heads and [latex]text{B}[/latex] is the event that the second toss is a heads, then [latex]text{P}left(text{A}cap{text{B}}right)=frac{1}{2}cdotfrac{1}{2}=frac{1}{4}[/latex].

## Complementary Events

The complement of [latex]text{A}[/latex] is the event in which [latex]text{A}[/latex] does not occur.

### Learning Objectives

Explain an example of a complementary event

### Key Takeaways

#### KEY POINTS

- The complement of an event [latex]text{A}[/latex] is usually denoted as [latex]text{A}′[/latex], [latex]text{A}^text{c}[/latex] or [latex]bar{text{A}}[/latex].
- An event and its complement are mutually exclusive, meaning that if one of the two events occurs, the other event cannot occur.
- An event and its complement are exhaustive, meaning that both events cover all possibilities.

#### KEY TERMS

**exhaustive**: including every possible element**mutually exclusive**: describing multiple events or states of being such that the occurrence of any one implies the non-occurrence of all the others

### What are Complementary Events?

In probability theory, the complement of any event [latex]text{A}[/latex] is the event [latex][text{not A}][/latex], i.e. the event in which [latex]text{A}[/latex] does not occur. The event [latex]text{A}[/latex] and its complement [latex][text{not A}][/latex] are mutually exclusive and exhaustive, meaning that if one occurs, the other does not, and that both groups cover all possibilities. Generally, there is only one event [latex]text{B}[/latex] such that [latex]text{A}[/latex] and [latex]text{B}[/latex] are both mutually exclusive and exhaustive; that event is the complement of [latex]text{A}[/latex]. The complement of an event [latex]text{A}[/latex] is usually denoted as [latex]text{A}′[/latex], [latex]text{A}^c[/latex] or [latex]bar{text{A}}[/latex].

### Examples

#### Simple Examples

A common example used to demonstrate complementary events is the flip of a coin. Let’s say a coin is flipped and one assumes it cannot land on its edge. It can either land on heads or on tails. There are no other possibilities (exhaustive), and both events cannot occur at the same time (mutually exclusive). Because these two events are complementary, we know that [latex]text{P}left(text{heads}right)+text{P}left(text{tails}right)=1[/latex].

Another simple example of complementary events is picking a ball out of a bag. Let’s say there are three plastic balls in a bag. One is blue and two are red. Assuming that each ball has an equal chance of being pulled out of the bag, we know that [latex]text{P}left(text{blue}right)=frac{1}{3}[/latex] and [latex]text{P}left(text{red}right)=frac{2}{3}[/latex]. Since we can only either chose blue or red (exhaustive) and we cannot choose both at the same time (mutually exclusive), choosing blue and choosing red are complementary events, and [latex]text{P}left(text{blue}right)+text{P}left(text{red}right)=1[/latex].

Finally, let’s examine a non-example of complementary events. If you were asked to choose any number, you might think that that number could either be prime or composite. Clearly, a number cannot be both prime and composite, so that takes care of the mutually exclusive property. However, being prime or being composite are not exhaustive because the number 1 in mathematics is designated as “unique. ”

Source: Statistics