Mean excluding outliers
m = trimmean(X,percent)
m = trimmean(X,percent,flag)
m = trimmean(x,percent,flag,dim)
m = trimmean(X,percent) calculates the trimmed mean of the values in X. For a vector input, m is the mean of X, excluding the highest and lowest k data values, where k=n*(percent/100)/2 and where n is the number of values in X. For a matrix input, m is a row vector containing the trimmed mean of each column of X. For n-D arrays, trimmean operates along the first non-singleton dimension. percent is a scalar between 0 and 100.
|'round'||Round k to the nearest integer (round to a smaller integer if k is a half integer). This is the default.|
|'floor'||Round k down to the next smaller integer.|
|'weight'||If k=i+f where i is the integer part and f is the fraction, compute a weighted mean with weight (1-f) for the (i+1)th and (n-i)th values, and full weight for the values between them.|
This example shows a Monte Carlo simulation of the efficiency of the 10% trimmed mean relative to the sample mean for normal data.
x = normrnd(0,1,100,100); m = mean(x); trim = trimmean(x,10); sm = std(m); strim = std(trim); efficiency = (sm/strim).^2 efficiency = 0.9702
Generate random data from the t distribution, which tends to have outliers:
rng('default') % to reproduce the plot exactly x = trnd(1,40,1); probplot(x)
Though the distribution is symmetric around zero, there are several outliers which will affect the mean. The trimmed mean is much closer to zero, which is much more representative of the data:
mean(x) ans = 2.7991 trimmean(x,25) ans = 0.8797
The trimmed mean is a robust estimate of the location of a sample. If there are outliers in the data, the trimmed mean is a more representative estimate of the center of the body of the data than the mean. However, if the data is all from the same probability distribution, then the trimmed mean is less efficient than the sample mean as an estimator of the location of the data.