Python: module mpyc.statistics

mpyc.statistics

index
github.com/lschoe/mpyc/blob/v0.10/mpyc/statistics.py

This module provides secure versions of common mathematical statistics functions. The module is modeled after the statistics module in the Python standard library, and as such aimed at small scale use ("at the level of graphing and scientific calculators"). Functions mean, median, median_low, median_high, quantiles, and mode are provided for calculating averages (measures of central location). Functions variance, stdev, pvariance, pstdev are provided for calculating variability (measures of spread). Functions covariance, correlation, linear_regression are provided for calculating statistics regarding relations between two sets of data. Most of these functions work best with secure fixed-point numbers, but some effort is done to support the use of secure integers as well. For instance, the mean of a sample of integers is rounded to the nearest integer, which may still be useful. The variance of a sample of integers is also rounded to the nearest integer, but this will only be useful if the sample is properly scaled. A baseline implementation is provided, favoring simplicity over efficiency. Also, the current implementations of mode, median, and quantiles favor a small privacy leak over a strict but less efficient approach. If these functions are called with plain data, the call is relayed to the corresponding function in Python's statistics module.

Modules

mpyc.asyncoro
mpyc.random
statistics
sys

Functions

correlation(x, y)
Return Pearson's correlation coefficient for x and y. Pearson's correlation coefficient takes values between -1 and +1. It measures the strength and direction of the linear relationship between x and y, where +1 means very strong, positive linear relationship, -1 very strong, negative linear relationship, and 0 no linear relationship.

covariance(x, y)
Return the sample covariance of x and y.

fsum(seq, /)
Return an accurate floating point sum of values in the iterable seq. Assumes IEEE-754 floating point arithmetic.

linear_regression(x, y)
Return a (simple) linear regression model for x and y. The parameters of the model are returned as a named LinearRegression tuple, with two fields called "slope" and "intercept", respectively. A linear regression model describes the relationship between independent variable x and dependent variable y in terms of a linear function: y = slope * x + intercept + noise Here, slope and intercept are the regression parameters estimated using ordinary least squares, and noise represents the variability of the data not explained by the linear regression (it is equal to the difference between predicted and actual values of the dependent variable).

mean(data)
Return the sample mean (average) of data which can be a sequence or an iterable. If the data points are secure integers or secure fixed-point numbers, the mean value returned is of the same secure type, rounded to the nearest number. If data is empty, StatisticsError will be raised.

median(data)
Return the median of numeric data, using the common “mean of middle two” method. If data is empty, StatisticsError is raised. data can be a sequence or iterable. When the number of data points is even, the median is interpolated by taking the average of the two middle values.

median_high(data)
Return the high median of numeric data. If data is empty, StatisticsError is raised. data can be a sequence or iterable. The high median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned.

median_low(data)
Return the low median of numeric data. If data is empty, StatisticsError is raised. data can be a sequence or iterable. The low median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned.

mode(data)
Return the mode, the most common data point from discrete or nominal data. If there are multiple modes with the same frequency, the first one encountered in data is returned. If data is empty, StatisticsError is raised. To speed up the computation, the bit length of the sample range max(data) - min(data) is revealed, provided this range is not too small.

pstdev(data, mu=None)
Return the population standard deviation (square root of the population variance). See pvariance() for arguments and other details.

pvariance(data, mu=None)
Return the population variance of data, an iterable of at least two numbers. If the optional second argument mu is given, it is typically the mean of the data. It can also be used to compute the second moment around a point that is not the mean. If it is missing or None (the default), the arithmetic mean is automatically calculated. Use this function to calculate the variance from the entire population. To estimate the variance from a sample, the variance() function is usually a better choice. Raises StatisticsError if data is empty.

quantiles(data, *, n=4, method='exclusive')
Divide data into n continuous intervals with equal probability. Returns a list of n-1 cut points separating the intervals. Set n to 4 for quartiles (the default). Set n to 10 for deciles. Set n to 100 for percentiles which gives the 99 cuts points that separate data into 100 equal sized groups. The data can be any iterable containing samples. The cut points are linearly interpolated between data points. If method is set to 'inclusive', data is treated as population data. The minimum value is treated as the 0th percentile (lowest quantile) and the maximum value is treated as the 100th percentile (highest quantile).

sqrt(x, /)
Return the square root of x.

stdev(data, xbar=None)
Return the sample standard deviation (square root of the sample variance). See variance() for arguments and other details.

variance(data, xbar=None)
Return the sample variance of data, an iterable of at least two numbers. If the optional second argument xbar is given, it should be the mean of data. If it is missing or None (the default), the mean is automatically calculated. Use this function when your data is a sample from a population. To calculate the variance from the entire population, see pvariance(). Raises StatisticsError if data has fewer than two values.

Data

runtime = None