Mastering Homogeneity Hypothesis testing

Contingency Tests Overview

Intro

In the world of statistical analysis, contingency tests play a crucial role in examining the relationship between two categorical variables. These tests are essential tools for researchers across various disciplines, enabling them to determine whether there is a significant correlation between the variables of interest.

Real-world Relevance

To illustrate the practical significance of contingency tests, let’s consider a real-world scenario: imagine a market research team is investigating the relationship between customer satisfaction (a few levels e.g. Satisfied, Neutral, Dissatisfied) and the type of product purchased (there are multiple products) from an online marketplace. They collect data from a limited sample of customers who recently made purchases building a contingency table. By applying a contingency test, such as Fisher’s exact or Chi-squared test, researchers can determine whether customer satisfaction and the type of product purchased are connected.

Focus

This notebook embarks on a journey to explore the subtleties of Fisher’s exact vs. Chi-squared tests application, delving into Fisher’s implementation nuances, and performance characteristics. A comparative analysis of two implementations of the Fisher’s exact test is covered: one crafted in pure Python and the other leveraging the statistical package of R through the rpy2 library. Furthermore, we’ll scrutinize the performance and accuracy of both approaches, comparing their results and dissecting their respective advantages.

Background: Fisher’s Exact and Chi-Squared

Contingency Tables and Statistical Independence

Contingency tables serve as the foundation for these tests, presenting the observed frequencies of different combinations of categories for the two or more variables. By analyzing the distribution of frequencies within the table, contingency tests help to assess whether the observed patterns are likely due to chance or reflect a genuine interconnection between the variables.

Code

table = [[1, 24, 5], [5, 20, 7], [14, 11, 7], [11, 14, 8], [10, 10, 10], [12, 12, 12]]

Fisher’s Exact

Fisher’s Exact Test, a non-parametric statistical test, plays a pivotal role in hypothesis testing for categorical data. This test is particularly valuable when dealing with small sample sizes, where the assumptions of the chi-squared test are violated.

Derivation of the Hypergeometric Distribution

The hypergeometric distribution arises from a scenario involving sampling without replacement from a finite population containing two types of objects: “successes” and “failures.” In the context of contingency tables, these objects correspond to the different categories of the two variables being analyzed.

Consider a population of size N, containing K objects classified as “successes” and N-K objects classified as “failures.” We draw a sample of size n without replacement from this population. The hypergeometric distribution describes the probability of obtaining exactly k successes in the sample.

The Hypergeometric Distribution Formula

The probability mass function of the hypergeometric distribution is given by:

$$P(X=k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}$$

where:

N: Total population size
K: Number of successes in the population
n: Sample size
k: Number of successes in the sample

Fisher’s Exact Test: Applying the Hypergeometric Distribution

Fisher’s Exact Test leverages the hypergeometric distribution to calculate the exact probability of observing a given contingency table, or one more extreme, assuming the null hypothesis of independence between the variables. This exact probability is then used to assess the statistical significance of the observed association.

Chi-Squared

The Chi-squared test is another widely used method for analyzing contingency tables to determine whether there is a significant connection between categorical variables. It relies on a statistical approach based on the Chi-squared distribution.

Chi-Squared Statistic: Measuring the Difference

The Chi-squared test calculates a test statistic, denoted by $\chi^2$, which quantifies the difference between the observed frequencies in the contingency table and the frequencies expected under the null hypothesis of independence. The null hypothesis assumes that there is no association between the variables, meaning that the observed frequencies should be close to the expected frequencies.

Testing the Null Hypothesis

The calculated chi-squared statistic is then compared to the chi-squared distribution with degrees of freedom determined by the dimensions of the contingency table. If the table has $N \times M$ size then degrees of freedom = $(N-1)(M-1)$

Assumptions and Limitations

Both of these procedures, like any statistical test, operates under certain assumptions which are crucial for ensuring the validity of the test’s results. Their common requirements are:

Categorical Data: The variables being analyzed must be categorical, meaning they can be divided into distinct categories or groups.
Independent Observations: The observations in the contingency table should be independent of each other. This means that the outcome of one observation should not influence the outcome of another observation.

In addition each of them has a third extra requirement

Fisher: Fixed Margins

The row and column totals in the contingency table are considered fixed. This implies that the sample sizes for each category are predetermined.

Chi-squared: Sample sizes

The expected cell frequencies should be sufficiently large for the chi-squared approximation to be valid

Apparently namely the third condition for each is the most challenging.

The third assumption for Fisher is quite strict and is not usually satisfied in practice. There are other representatives of exact test’s family that are free of this requirement like Boschloo’s or Barnard’s tests, although they are much more computationally expensive, as they require the nuisance parameters estimation and it’s not feasible to implement them for the tables larger than 2x2. So the performance is the main issue of the exact tests and if it’s the case then Chi-squared test is advised to be applied instead.

For Chi-squared violations of the third assumption can lead to inaccurate results. In such cases, Fisher’s exact test is often preferred due to its ability to handle small sample sizes and sparse tables where the chi-squared test’s approximations may not hold true.

Fisher’s Exact Test Implementation

Pythonic Fisher NxM

As long as widely used python packages for statistics like scipy or statsmodels don’t furnish Fisher’s exact test for tables larger than 2x2, here is the author’s pure Pythonic implementation for this procedure, to get more details follow the function documentation below.

Code

import math


def _pvalue(func: object, shape: tuple=(2,2)) -> str:
    print(f"p-value: {func([row[:shape[1]] for row in table[:shape[0]]]):.5f}")


def NxM_Fisher_exact_test(table: list[list]) -> float:
    """
    Performs Fisher's exact test for a contingency table of an arbitrary size.

    Parameters
    ----------
    table: list[list]
        contigency matrix M x N

    Returns
    -------
    p-value: float
    """
    num_rows = len(table)
    num_cols = len(table[0])

    row_sums = [sum(row) for row in table]
    col_sums = [sum(table[i][j] for i in range(num_rows)) for j in range(num_cols)]

    log_p_constant = (
        sum(math.lgamma(x + 1) for x in row_sums)
        + sum(math.lgamma(y + 1) for y in col_sums)
        - math.lgamma(sum(row_sums) + 1)
    )

    def calculate_log_probability(matrix):
        """
        Calculates the log-probability of a contingency table n x m.

        Fisher's statistic under the truthful null hypothesis has a
        hypergeometric distribution of the numbers in the cells of the table.

        Therefore the probability of the contingency table follows
        hypergeometric probability mass function $C^K_k * C^N-K_n-k / C^N_n$

        So, simplifying it's clear that the probability follows:
        the product of factorials of total row and total columns counts
        divided by the total count factorial and factorials of each cell count.

        row_1! x..x row_n! x col_1! x..x col_m! / (cell_11! x..x cell_nm! x total!)

        1. As the gamma function satisfies: gamma(n + 1) = n!
        and it's computationally more stable- it's used instead of factorials.

        2. Making the computations more stable I'm switching from product to sum
        using logarithmic probability.

        """
        return log_p_constant - sum(
                math.lgamma(cell + 1) for row in matrix for cell in row
        )

    log_p_obs = calculate_log_probability(table)

    p_value = 0

    def dfs(matrix: list[list], row_id, col_id, tol=1e-10):
        """
        Recursive deep-first search function

        Generates all possible contingency tables and calculates their
        log-probability adding up those, that are at least as extreme as
        the observed contingency table, to the total p-value

        Args:
            matrix: A list of lists representing the contingency table
            row_id: Row index up to which the table is already filled
            col_id: Column index up to which the table is already filled
            tol: Maximum absolute log-probability comparison error

        Returns:
            None
        """
        nonlocal p_value

        # Copy is necessary to make recursion working
        table = [row.copy() for row in matrix]

        # Stopping condition - only the last row and column are left
        if row_id == num_rows - 1 and col_id == num_cols - 1:

            for i in range(row_id): # fill last column
                table[i][col_id] = row_sums[i] - sum(table[i][:col_id])
            for j in range(col_id): # fill last row
                table[row_id][j] = col_sums[j] - sum(table[i][j] for i in range(row_id))

            bottom_right_cell = row_sums[row_id] - sum(table[row_id][:col_id])

            if bottom_right_cell < 0:
                # Non-reliable table, all cells must be non-negative
                return

            else:
                table[row_id][col_id] = bottom_right_cell
                log_p = calculate_log_probability(table)

                if log_p <= log_p_obs + tol:
                    p_value += math.exp(log_p)

                return

        # Fill the table until the Stopping condition isn't met
        else:

            remaining_row_sum = row_sums[row_id] - sum(table[row_id])
            remaining_col_sum = col_sums[col_id] - sum(table[i][col_id] for i in range(num_rows))

            for k in range(min(remaining_row_sum, remaining_col_sum) + 1):

                table[row_id][col_id] = k

                if row_id == num_rows - 2 and col_id == num_cols - 2:
                    dfs(table, row_id + 1, col_id + 1, tol=tol)
                elif row_id == num_rows - 2:
                    dfs(table, 0, col_id + 1, tol=tol)
                else:
                    dfs(table, row_id + 1, col_id, tol=tol)

    dfs(matrix=[[0] * num_cols for _ in range(num_rows)], row_id=0, col_id=0)

    return p_value

While this exact test above is a precise solution, it does have limitations related to the computational intensity of the test, especially when dealing with large contingency tables. As the table size increases, the number of possible arrangements of data grows exponentially, making the calculations more time-consuming.

_pvalue(NxM_Fisher_exact_test, shape=(3, 2))

p-value: 0.00014

R Fisher NxM

Another option that is to use rpy bridge from Python to R, this function works for an arbitrary shape of contingency table and unfortunately doesn’t have alternatives in Python.

Code

import numpy as np

import rpy2.robjects.numpy2ri
from rpy2.robjects.packages import importr


def R_fisher_exact_test(table: list[list]) -> float:
    """
    Performs exact Fisher's test using R

    Parameters
    ----------
    table: list[list]
        contigency matrix M x N

    Returns
    -------
    p-value: float
    """

    # Enable automatic conversion between NumPy and R arrays
    rpy2.robjects.numpy2ri.activate()

    # Import necessary R package
    stats = importr('stats')

    # Perform Fisher's test using the R function with more memory to get p-value
    result = stats.fisher_test(np.array(table), workspace = 2e9)

    # Extract the p-value
    p_value = result[0][0]

    return p_value

Note that the rpy2 package has native dependencies, what in particular means that, installed R accompanied by the corresponding libraries is required.

_pvalue(R_fisher_exact_test, shape=(3, 2))

p-value: 0.00014

Algorithmic Differences: Python vs R

While both the Python and R implementations ultimately calculate the p-value for Fisher’s Exact Test, they employ distinct algorithms under the hood, each with its own strengths and weaknesses. Understanding these differences is crucial for selecting the most appropriate implementation for a given scenario.

Python: utilizes a recursive algorithm to enumerate all possible contingency tables that could arise under the null hypothesis. This approach, while conceptually straightforward, can become computationally expensive.

R: in contrast, it leverages optimized algorithms and data structures that are specifically designed for efficient calculation of Fisher’s Exact Test. These algorithms, often implemented in compiled languages, take advantage of advanced numerical techniques and data representations to minimize computational overhead.

Performance Comparison

Trade-offs and Considerations:

The choice between the Python and R implementations depends on the specific needs of the analysis. For smaller tables, the Python implementation may suffice, offering ease of understanding and implementation. However, as the table size increases, the computational advantages of the R implementation become more pronounced.

I suggest we generate random contingency tables with dimensions ranging from 2x2 to 5x5, representing a diverse range of scenarios encountered in real-world applications. For each table size, we will measure the execution time required by both the Python and R implementations to calculate the p-value.

Code

%%time
_pvalue(NxM_Fisher_exact_test, shape=(3, 2))
_pvalue(NxM_Fisher_exact_test, shape=(4, 2))
_pvalue(NxM_Fisher_exact_test, shape=(3, 3))

p-value: 0.00014
p-value: 0.00012
p-value: 0.00085
CPU times: user 745 ms, sys: 7.96 ms, total: 753 ms
Wall time: 756 ms

Code

%%time
_pvalue(R_fisher_exact_test, shape=(3, 2))
_pvalue(R_fisher_exact_test, shape=(4, 2))
_pvalue(R_fisher_exact_test, shape=(3, 3))

p-value: 0.00014
p-value: 0.00012
p-value: 0.00085
CPU times: user 2.6 s, sys: 219 ms, total: 2.82 s
Wall time: 2.82 s

Code

%%time
_pvalue(R_fisher_exact_test, shape=(4, 3))

p-value: 0.00149
CPU times: user 1.46 s, sys: 97 ms, total: 1.55 s
Wall time: 1.88 s

Code

%%time
_pvalue(NxM_Fisher_exact_test, shape=(4, 3))

p-value: 0.00149
CPU times: user 8min, sys: 921 ms, total: 8min 1s
Wall time: 8min 5s

For sizes more than 2 x 5 and 3 x 3, R package function can significantly outperform the Python counterpart, whereas with tables of less size pure Python function is shining.

The results of the benchmarks revealed a clear trend: Python implementation is beaten in terms of execution time for larger tables. As the table dimensions increased, the performance gap between the two implementations is widened drastically. This observation aligns with the algorithmic differences discussed earlier, where the optimized algorithms and data structures employed by the R implementation proved to be more efficient.

As a rule of thumb I propose to apply R if $N \times M > 10$, otherwise Python is preferable and what is more - it doesn’t have any dependencies on non-native packages

Accuracy Comparison

Along with the performance let’s assure that the p-values generated by both methods are equivalent

Code

from scipy.stats import multinomial

from statsmodels.stats.proportion import proportion_confint

P = np.arange(.1, .35, .05)
n = 40

rv = multinomial(n, P)

np.random.seed(2024)

tol = 1e-5
alpha = 0.05
n_iterations = 100


for shape in [(2, 2), (2, 4), (3, 3)]:

    false_positives = 0

    for i in range(n_iterations):

        contingency_table = rv.rvs(shape[0])

        p_value_py = NxM_Fisher_exact_test([row[:shape[1]] for row in contingency_table])
        p_value_r = R_fisher_exact_test([row[:shape[1]] for row in contingency_table])

        if abs(p_value_py - p_value_r) > tol:
            print(f"Different p-values! Python: {p_value_py}, R: {p_value_r}")
            break
        elif p_value_py <= alpha:
            false_positives += 1

    l, r = proportion_confint(count=false_positives, nobs=n_iterations, alpha=0.10, method='wilson')
    print(f"shape: {shape}, false positives: {false_positives/n_iterations:.3f} ± {(r - l) / 2:.3f}")

shape: (2, 2), false positives: 0.020 ± 0.026
shape: (2, 4), false positives: 0.030 ± 0.030
shape: (3, 3), false positives: 0.030 ± 0.030

So, it’s clear from multiple iterations for different tables and sizes that there is no a single case of different p-values, so the equivalence is practically evident. In addition I’ve checked Type I error level, it’s well below the bound of 5%, which means that ideologically the criterions are valid.

Performance Analysis: A Comparative Benchmark

As the textbooks say, Fisher’s Exact Test stands out as a particularly versatile option when dealing with small sample sizes or sparse contingency tables. It provides accurate p-values, even when the assumptions of other commonly used test, such as the chi-squared test, might be violated. This benefit makes Fisher’s Exact test an invaluable method when working with limited data or situations where the chi-squared test’s approximation is not applicable.

I offer you a procedure to challenge these statements, namely to call the power of the exact test out, when chi-squared assumptions are not satisfied. First we will check the correctness of these two methods and then the power.

Code

def chi_squared_challenge(
    shape: tuple = (2, 2),
    n_iterations: int=1_000,
    alpha: float=0.05,
    aa_test: bool=True
) -> None:

    for i in range(1 + len(rv.rvs()[0]) - shape[1]):

        fisher_positives = 0
        chi2_positives = 0
        chi2_yates_positives = 0

        zero_expected_count = 0

        less_than_5 = 0
        less_than_10 = 0

        for _ in range(n_iterations):

            contingency_table = rv.rvs(shape[0])[:, i:i+shape[1]]

            if not aa_test:
                contingency_table[0] = contingency_table[0] ** 2

            if np.min(contingency_table) == 0:
                zero_expected_count += 1
                continue

            less_than_5 += np.max(np.array(contingency_table) < 5)
            less_than_10 += np.max(np.array(contingency_table) < 10)

            if shape == (2, 2):
                p_value_fisher = fisher_exact(contingency_table).pvalue
                p_value_chi2_yates = chi2_contingency(contingency_table, correction=True).pvalue
                if p_value_chi2_yates <= alpha:
                    chi2_yates_positives += 1
            else:
                p_value_fisher = NxM_Fisher_exact_test(contingency_table)
            p_value_chi2 = chi2_contingency(contingency_table, correction=False).pvalue


            if p_value_chi2 <= alpha:
                chi2_positives += 1
            if p_value_fisher <= alpha:
                fisher_positives += 1

        valid_tables = n_iterations - zero_expected_count

        print(
            f"\nIf out of {valid_tables} valid {shape[0]}x{shape[1]} tables "
            f"(w/o zero expected count) number of tables with less than:"
            f"\n - 5 elements in any cell is {less_than_5}"
            f"\n - 10 elements in any cell is {less_than_10}"
            f"\n Then p-values are:"
        )

        l, r = proportion_confint(count=fisher_positives, nobs=valid_tables, alpha=0.10, method='wilson')
        print(f"Fisher positives: {fisher_positives/valid_tables:.3f} ± {(r - l) / 2:.3f}")

        l, r = proportion_confint(count=chi2_positives, nobs=valid_tables, alpha=0.10, method='wilson')
        print(f"Chi2 positives: {chi2_positives/valid_tables:.3f} ± {(r - l) / 2:.3f}")

        if not shape == (2, 2):
            continue

        l, r = proportion_confint(count=chi2_yates_positives, nobs=valid_tables, alpha=0.10, method='wilson')
        print(f"Chi2 Yates positives: {chi2_yates_positives/valid_tables:.3f} ± {(r - l) / 2:.3f}")

According to frequently encountered requirements in the literature regarding expected cell counts for chi-squared test application, a common rule is at least 5 (some requires 10) in all cells of 2x2 table, and 5 or more in 80% of cells in larger tables, but no cells with zero expected count. Furthermore, when the assumption for 2x2 table is not met, Yates’s correction is applied.

Now, we will check the feasibility of these conditions for 2x2 tables first.

Correctness

Code

np.random.seed(26)

chi_squared_challenge(aa_test=True)

If out of 969 valid 2x2 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 915
 - 10 elements in any cell is 969
 Then p-values are:
Fisher positives: 0.019 ± 0.007
Chi2 positives: 0.043 ± 0.011
Chi2 Yates positives: 0.011 ± 0.006

If out of 993 valid 2x2 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 566
 - 10 elements in any cell is 993
 Then p-values are:
Fisher positives: 0.024 ± 0.008
Chi2 positives: 0.050 ± 0.011
Chi2 Yates positives: 0.015 ± 0.006

If out of 1000 valid 2x2 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 178
 - 10 elements in any cell is 993
 Then p-values are:
Fisher positives: 0.032 ± 0.009
Chi2 positives: 0.052 ± 0.012
Chi2 Yates positives: 0.022 ± 0.008

If out of 1000 valid 2x2 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 27
 - 10 elements in any cell is 831
 Then p-values are:
Fisher positives: 0.033 ± 0.009
Chi2 positives: 0.049 ± 0.011
Chi2 Yates positives: 0.022 ± 0.008

Power

Code

np.random.seed(26)

chi_squared_challenge(aa_test=False)

If out of 969 valid 2x2 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 785
 - 10 elements in any cell is 969
 Then p-values are:
Fisher positives: 0.300 ± 0.024
Chi2 positives: 0.359 ± 0.025
Chi2 Yates positives: 0.265 ± 0.023

If out of 993 valid 2x2 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 373
 - 10 elements in any cell is 987
 Then p-values are:
Fisher positives: 0.325 ± 0.024
Chi2 positives: 0.370 ± 0.025
Chi2 Yates positives: 0.295 ± 0.024

If out of 1000 valid 2x2 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 90
 - 10 elements in any cell is 897
 Then p-values are:
Fisher positives: 0.334 ± 0.025
Chi2 positives: 0.363 ± 0.025
Chi2 Yates positives: 0.305 ± 0.024

If out of 1000 valid 2x2 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 9
 - 10 elements in any cell is 586
 Then p-values are:
Fisher positives: 0.351 ± 0.025
Chi2 positives: 0.387 ± 0.025
Chi2 Yates positives: 0.327 ± 0.024

Surprise! In this example it’s shown there is no need for Yates nor for Fisher’s exact test at all! Chi-squared test doesn’t inflate the number of Type I errors and keep the power at least as high as it’s for the exact test regardless of the number of cells with low frequencies.

JFYI: Some other exact tests might be applied instead of Fisher’s test, e.g. Boschloo’s test provides higher power but it’s a) much slower and b) yet worse than a plain chi-squared, you may prove it on your own as an exercise. Hint: there is a function boschloo_exact in scipy

Okay, it’ clear with 2x2, but what if the table is getting bigger?

Code

np.random.seed(26)

chi_squared_challenge(shape=(2, 4), aa_test=False)

If out of 969 valid 2x4 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 804
 - 10 elements in any cell is 969
 Then p-values are:
Fisher positives: 0.685 ± 0.025
Chi2 positives: 0.688 ± 0.024

If out of 993 valid 2x4 tables (w/o zero expected count) number of tables with less than:
 - 5 elements in any cell is 388
 - 10 elements in any cell is 993
 Then p-values are:
Fisher positives: 0.666 ± 0.025
Chi2 positives: 0.672 ± 0.024

The power values are not statistically distinguishable, so chi-squared is still the winner as it’s much simpler in calculations and for the data model that I specified it seems that it can handle small table sizes - it takes low values at least as good as Fisher’s test.

Monte-Carlo Simulation Results Interpretation

I’d like to make an extra note that Monte-Carlo simulations can provide valuable insights into the performance and behavior of statistical tests, but it’s essential to interpret their results with caution and awareness of their limitations.

While simulations can mimic real-world scenarios, they are inherently limited by the assumptions and parameters used in their design. They may not capture the full complexity of real-world data and may not be generalizable to all situations. Therefore, it’s crucial to consider the specific context and limitations of the simulation when interpreting its results.

Saying that I must admit that I don’t have an intention to prove that there is no need for exact tests in any experiment design, I’d rather invite you to challenge your data and your experiments set up specifics, as there is a chance that you will find that chi-squared test is all you need for contingency experiments.

Justification for Fisher’s Exact over Chi-Squared

Even when the chi-squared test appears to perform well in Monte-Carlo simulations, there are compelling theoretical and practical reasons to prefer Fisher’s Exact Test in specific situations. Understanding these justifications is crucial for making informed decisions about which test to apply.

Sparse Table: Chi-squared test relies on approximations that may not hold true when dealing with sparse contingency tables.
Sample Size: Chi-squared test is based on the asymptotic distribution of the test statistic, which assumes that the sample size is large.
Effect Size: Chi-squared may be less sensitive in detecting the small effect size, because the approximation may not be as accurate.

So, once again: in order to guarantee that you don’t have a need for exact tests in your data model setting, you must consciously simulate your data distributions (especially when it comes to sparse tables, small sample sizes and small effect sizes) and then make a decision, the process that I presented here is based on the data my team is exposed to most and hopefully it might be easily simulated with Multinomial distribution.

General Pipeline

Finally I’d like to offer you a full pipeline on how to organize contingency tests efficiently: when Fisher’s exact test shall be applied and when Chi-squared is just enough.

As you know, exact tests could take time and what I want to achieve is to have a control over the time that I allocate to the function execution.

There are a few ways to implement timeouts, my favourite one is leveraging multiprocessing capabilities, however it’s not always the case that you can run a subprocess under your main process in production, so another concise way to apply timeouts will be shown via func_timeout library.

Concurrent Execution

Simple decorator to pass the output from the subprocess into main process.

Code

from typing import Optional
from functools import wraps
from multiprocessing import Queue, Process

def subprocess_output(procedure: object, queue: Optional[Queue]=None) -> object:

    @wraps(procedure)
    def wrapper(*args, **kwargs):

        p_value = procedure(*args, **kwargs)
        queue.put(p_value)

        return p_value

    return wrapper


def concurrent_test(
    method: object,
    table: list[list],
    name: str="NxM Fisher's exact test",
    timeout: int=10,
) -> float:
    """
    Runs the given method in a separate process with a timeout.
    If the process takes longer than the timeout, it is terminated.
    The result is returned from the main process.

    Parameters
    ----------
    method: object
        The method to be run in a separate process.
    table: list[list]
        Contingency matrix M x N
    name: str
        Process name
    timeout: int
        Time limit for subprocess execution

    Returns
    -------
    p-value: float
    """
    queue = Queue()
    procedure = subprocess_output(method, queue)
    p = Process(
        target=procedure,
        args=(table,),
        name=name
    )
    p.start()
    p.join(timeout=timeout)
    p.terminate()
    if p.exitcode is not None:
        p_value = queue.get()
        return p_value

Timeout Execution

Handy function that is a good solution if a timeout is the only thing you want to get from a subprocess

Code

# pip install func-timeout
from func_timeout import func_timeout, FunctionTimedOut


def timeout_test(
    method: object,
    table: np.ndarray,
    timeout: int=10,
) -> float:

    try:
        p_value = func_timeout(timeout, method, args=(table,))
    except FunctionTimedOut:
        p_value = None

    return p_value

By the way: there is no need for timeout when running scipy Fisher’s exact test, so it’s applied only to those methods analyzed in the chapter above.

Code

%%time
fisher_exact(np.array([[10000, 4000], [12000, 5000]]))

CPU times: total: 0 ns
Wall time: 14 ms

SignificanceResult(statistic=1.0416666666666667, pvalue=0.10488212218194087)

Universal procedure

Logic

Code

from scipy.stats import chi2_contingency, fisher_exact


def _contingency_test(table: np.ndarray, criterion: str, timeout: int) -> dict:
    """
    Performs a test of independence of variables in a contingency table

    Parameters
    ----------
    table: np.ndarray
        Contingency matrix M x N
    criterion: str
        Test to be performed
    timeout: int
        Time limit for Fisher's exact test

    Returns
    -------
    dict: {
        "method": str,
        "p-value": float,
        "error": str (optional)
    }
    """
    if criterion not in {"chi-squared", "fisher-exact", "textbook"}:
        raise ValueError(
            "Incorrect type of criterion, "
            "should be one of the following: 'chi-squared', 'fisher-exact', 'textbook'"
        )

    # No timeout if "fisher-exact" criterion is set
    timeout = timeout + 10_000 * (criterion == "fisher-exact")

    result = dict.fromkeys(["method", "p-value"])

    try:
        if criterion == "textbook" and table.shape == (2, 2) and (table >= 10).all():
            result["method"] = "2x2 Pearson's chi-squared test with Yates"
            test = chi2_contingency(table, correction=True)
            result["p-value"] = test.pvalue
        elif criterion != "chi-squared" and table.shape == (2, 2):
            result["method"] = "2x2 Fisher's exact test in Python"
            test = fisher_exact(table)
            result["p-value"] = test.pvalue
        elif criterion == "chi-squared" or (criterion == "textbook" and np.sum(table >= 5) >= np.size(table) * 0.80):
            result["method"] = "NxM Pearson's chi-squared test w/o Yates"
            test = chi2_contingency(table, correction=False)
            result["p-value"] = test.pvalue
        else: # try Exact fisher test if doesn't take too much
            if np.size(table) > 10:
                name = "NxM Fisher's exact test in R"
                p_value = timeout_test(
                    R_fisher_exact_test, table, timeout
                )
            else:
                name = "NxM Fisher's exact test in Python"
                p_value = timeout_test(
                    NxM_Fisher_exact_test, table, timeout
                )
            result["method"] = name
            if p_value:
                result["p-value"] = p_value
            else:
                result["method"] = " ".join([
                    result["method"],
                    "timed out.",
                    "NxM Pearson's chi-squared test approximation applied"
                ])
                test = chi2_contingency(table, correction=False)
                result["p-value"] = test.pvalue

    except Exception as error:
        result["error"] = f"{error}"

    return result

Here is a quick example of how it works with the identified table

Code

t = np.array([row[:2] for row in table[:5]])

_contingency_test(t, 'textbook', 5)

{'method': "NxM Pearson's chi-squared test w/o Yates",
 'p-value': 0.00032439327665678783}

Input validation

Adding a validation is an important step to prevent end-users from inference the function in the wrong way

Code

def _validate_input(table: list[list]) -> np.array:

    try:
        array = np.array(table)
    except ValueError:
        raise ValueError(
            "Contingency table's rows must be of equal length."
        )

    try:
        array = array.astype(dtype=int)
    except ValueError:
        raise ValueError(
            "All cells must contain integer numbers."
        )

    if array.ndim != 2:
        raise ValueError(
            "Contigency table must be a 2-dimensional array."
        )

    if (array < 0).any():
        raise ValueError(
            "All cells must contain non-negative numbers."
        )

    if array.shape[0] == 2 and np.max(np.min(array, axis=1)) == 0:
        raise ValueError(
            "There are cells with zero expected count. "
            "Expectations must contain only positive numbers."
        )

    return array

Put it all together

Code

def general_contingency_test(table: list[list], criterion='textbook', timeout: int=10) -> dict:
    """
    Performs a test of independence of variables in a contingency table

    Parameters
    ----------
    table: list[list]
        matrix M x N,
        where M is the number of compared groups and N is the set of measures
    timeout: int
        Time limit for Fisher's exact test,
        if the calculation takes longer chi-squared test is applied instead

    Returns
    -------
    dict: {
        "method": str,
        "p-value": float,
        "error": str (optional)
    }
    """
    return _contingency_test(_validate_input(table), criterion, timeout)

Now when we have a general procedure, let’s take a look at a few examples of the inference

Code

general_contingency_test(table, criterion='chi-squared')

{'method': "NxM Pearson's chi-squared test w/o Yates",
 'p-value': 0.0020940807559433087}

Code

general_contingency_test([[1, 2, 3, 5, 6, 100, 2000], [4, 5, 6, 7, 8, 150, 1000]], timeout=10)

{'method': "NxM Fisher's exact test in R timed out. NxM Pearson's chi-squared test approximation applied",
 'p-value': 5.060441099877772e-17}

The logic wrapped into the library

Good news for you, you don’t need to repeat all the code that was shared above as it’s already a part of the public Python package podlozhnyy_module that comes hande every time you have data analysis assignments at work.

Key Features of the Library:

Automatic Test Selection: By default, the library automatically selects the most appropriate test based on textbook rules, considering factors such as sample size and expected cell frequencies. This intelligent selection process ensures the validity and accuracy of the results.
Flexibility and Control: Users have the flexibility to override the automatic selection and force the library to apply a specific test if desired. This feature is particularly useful when researchers have prior knowledge or preferences regarding the test to be used.
User-Friendly Interface: The library’s interface is designed to be intuitive and easy to use, enabling researchers to effortlessly perform contingency tests without extensive coding or technical expertise.

# !pip install podlozhnyy-module==2.6-alpha

import podlozhnyy_module as pm

Library’s application is as simple as the following command

pm.contingency.general_contingency_test(table[:5], criterion='fisher-exact', timeout=1)

{'method': "NxM Fisher's exact test in R", 'p-value': 0.0011288225617825118}

pm.contingency.general_contingency_test(table[:3], criterion='fisher-exact', timeout=1)

{'method': "NxM Fisher's exact test in Python",
 'p-value': 0.0008451539443552633}

Conclusion: A Unified Framework for Contingency Tests

This notebook has explored various aspects of contingency tests, with a particular focus on Fisher’s Exact Test comparison to Chi-squared test. As a part of the journey we have contrasted the Python and R implementations of the Fisher’s test, highlighting their strengths and weaknesses in terms of performance, accuracy, and computational considerations.

The powerful framework of Monte-Carlo simulations is provided to enable the readers to simulate their data and ultimately apply proper testing techniques in their field providing the accurate guidance to the business.

podlozhnyy_module library: a flexible solution

The code presented in this notebook has been thoughtfully integrated into the podlozhnyy_module library, offering a flexible and user-friendly solution for conducting contingency tests. This library empowers users to select the most appropriate test based on textbook rules or to override the default behavior and force the application of a specific test, such as Fisher’s Exact or the Chi-squared test.

Contingency Tests Overview#

Background: Fisher’s Exact and Chi-Squared#

Fisher’s Exact#

Chi-Squared#

Fisher’s Exact Test Implementation#

Pythonic Fisher NxM#

R Fisher NxM#

Algorithmic Differences: Python vs R#

Performance Comparison#

Accuracy Comparison#

Performance Analysis: A Comparative Benchmark#

Correctness#

Power#

Monte-Carlo Simulation Results Interpretation#

Justification for Fisher’s Exact over Chi-Squared#

General Pipeline#

Concurrent Execution#

Timeout Execution#

Universal procedure#

Logic#

Input validation#

Put it all together#

The logic wrapped into the library#

Conclusion: A Unified Framework for Contingency Tests#

Contingency Tests Overview

Background: Fisher’s Exact and Chi-Squared

Fisher’s Exact

Chi-Squared

Fisher’s Exact Test Implementation

Pythonic Fisher NxM

R Fisher NxM

Algorithmic Differences: Python vs R

Performance Comparison

Accuracy Comparison

Performance Analysis: A Comparative Benchmark

Correctness

Power

Monte-Carlo Simulation Results Interpretation

Justification for Fisher’s Exact over Chi-Squared

General Pipeline

Concurrent Execution

Timeout Execution

Universal procedure

Logic

Input validation

Put it all together

The logic wrapped into the library

Conclusion: A Unified Framework for Contingency Tests