Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Removing Outliers

Subject: Removing Outliers

From: ln

Date: 18 Feb, 2005 01:06:48

Message: 1 of 23

Hi, i hv a set of data (1912 points in coordinate x,y,z). The data hv
outliers & i remove it using the method that show in matlab help
file, which if the data > 3 sigma, the data will b remove.
   The question is when i plot the data versus time (in second) :
time domain, the data which already remove, the time will also b
remove. So, example if i want to take the coordinate point in 40 sec,
the time are not correct already, cos all the time are shift by the
removing outliers.
   So, does anyone know how to maintain the time observation although
v already remove the outliers? The problem in here is the time will
shift or change cos b the removing outliers.

Pls help. Thanks!

Subject: Removing Outliers

From: Michael Robbins

Date: 18 Feb, 2005 01:14:29

Message: 2 of 23

ln wrote:
>
>
> Hi, i hv a set of data (1912 points in coordinate x,y,z). The data
> hv
> outliers & i remove it using the method that show in matlab help
> file, which if the data > 3 sigma, the data will b remove.
> The question is when i plot the data versus time (in second) :
> time domain, the data which already remove, the time will also b
> remove. So, example if i want to take the coordinate point in 40
> sec,
> the time are not correct already, cos all the time are shift by the
> removing outliers.
> So, does anyone know how to maintain the time observation
> although
> v already remove the outliers? The problem in here is the time will
> shift or change cos b the removing outliers.
>
> Pls help. Thanks!

Instead of removing the outliers, replace them with NaN.

Instead of

a(a>3.*sigma)=[];

do this

a(a>3.*sigma)=NaN;

or if your particular version of MATLAB doesn't like that

i=find(a>3.*sigma);
a(i)=NaN.*ones(size(i));

Subject: Removing Outliers

From: ln

Date: 18 Feb, 2005 01:36:57

Message: 3 of 23

Hi, Michael. Thanks your comment. I already instead it by NaN, but I
need to calculate the RMS & mean value for the data. When i put it in
NaN value, the RMS & mean will become NaN!!
   That another problem..!! So, how??

Subject: Removing Outliers

From: ln

Date: 18 Feb, 2005 01:53:05

Message: 4 of 23

Sorry, i hv another problem is i need to set the Xlim & Ylim value
for the graph. The result when i plot is hv error:

>> Bad value for axes property: 'YLim'
Values must be increasing and non-NaN.

Subject: Removing Outliers

From: Michael Robbins

Date: 18 Feb, 2005 02:18:26

Message: 5 of 23

ln <ltskin82@yahoo.com> wrote in news:eefc415.2@webx.raydaftYaTP:

> Sorry, i hv another problem is i need to set the Xlim & Ylim value
> for the graph. The result when i plot is hv error:
>
>>> Bad value for axes property: 'YLim'
> Values must be increasing and non-NaN.
>

You will have to substitue NaN tolerant functions for the ones you are
using

nanmean(a)

or remove the NaNs when you use the intolerant functions

mean(a(~isnan(a))

Subject: Removing Outliers

From: ln

Date: 18 Feb, 2005 05:17:15

Message: 6 of 23

Thanks a lot, Michael! The problem already solve. But now I need plot
the data in frequency domain using fft. After I remove the outliers,
my data contains "NaN" value.
    So how I plot my data that contains "NaN" value by fft? When I
write:

    EWd = EW - nanmean(EW);
    FFT_EW = fft(EWd,N_FFT);
    ....

    The result for FFT_EW all is NaN. Can't show the result. Can
anyone help, please?

Thanks!

Subject: Removing Outliers

From: Dave Robinson

Date: 18 Feb, 2005 05:52:46

Message: 7 of 23

ln wrote:
>
>
> Thanks a lot, Michael! The problem already solve. But now I need
> plot
> the data in frequency domain using fft. After I remove the
> outliers,
> my data contains "NaN" value.
> So how I plot my data that contains "NaN" value by fft? When I
> write:
>
> EWd = EW - nanmean(EW);
> FFT_EW = fft(EWd,N_FFT);
> ....
>
> The result for FFT_EW all is NaN. Can't show the result. Can
> anyone help, please?
>
> Thanks!

Having computed the mean value using you NaN tolerant mean function,
scan your data for Nan, and replace them with the mean value; a bit
cheeky, but it might just get you what you are after.

Regards

Dave Robinson

Subject: Removing Outliers

From: ln

Date: 18 Feb, 2005 11:17:43

Message: 8 of 23

Hi Robinson, thanks your comment. But I'm not so familiar with matlab
function, how I scan my data for NaN & replace it by the mean value?

    I try:
           find(EW == NaN);
           ...

But the answer is: ans = []

Sorry, need help again...

Subject: Removing Outliers

From: Steven Lord

Date: 18 Feb, 2005 12:21:30

Message: 9 of 23


"ln" <ltskin82@yahoo.com> wrote in message
news:eefc415.6@webx.raydaftYaTP...
> Hi Robinson, thanks your comment. But I'm not so familiar with matlab
> function, how I scan my data for NaN & replace it by the mean value?
>
> I try:
> find(EW == NaN);
> ...
>
> But the answer is: ans = []
>
> Sorry, need help again...

See HELP ISNAN.

--
Steve Lord
slord@mathworks.com

Subject: Removing Outliers

From: Carlos Lopez

Date: 18 Feb, 2005 12:38:03

Message: 10 of 23

Dave Robinson wrote:
> Having computed the mean value using you NaN tolerant mean
> function,
> scan your data for Nan, and replace them with the mean value; a bit
> cheeky, but it might just get you what you are after.
Dave is right; but beware that you are indeed modifying the
underlying statistical distribution of your population. If you are
working in a very precise problem which requires extremely accurate
answers, you might be introducing an error.
There might be other options for assigning the missing values; look
for "imputation" if you need so.
Regards
Carlos

Subject: Removing Outliers

From: Brett Shoelson

Date: 1 Mar, 2005 12:37:12

Message: 11 of 23


"ln" <ltskin82@yahoo.com> wrote in message
news:eefc415.-1@webx.raydaftYaTP...
> Hi, i hv a set of data (1912 points in coordinate x,y,z). The data hv
> outliers & i remove it using the method that show in matlab help
> file, which if the data > 3 sigma, the data will b remove.
> The question is when i plot the data versus time (in second) :
> time domain, the data which already remove, the time will also b
> remove. So, example if i want to take the coordinate point in 40 sec,
> the time are not correct already, cos all the time are shift by the
> removing outliers.
> So, does anyone know how to maintain the time observation although
> v already remove the outliers? The problem in here is the time will
> shift or change cos b the removing outliers.
>
> Pls help. Thanks!

In addition to all of the help you have gotten, you might be interested in
my function deleteoutliers, available at the FEX.
Cheers,
Brett

Subject: Removing Outliers

From: the cyclist

Date: 1 Mar, 2005 13:14:33

Message: 12 of 23

ln wrote:

> Hi, i hv a set of data (1912 points in coordinate x,y,z). The data
> hv
> outliers & i remove it using the method that show in matlab help
> file, which if the data > 3 sigma, the data will b remove.

I don't know which help file you read, but identifying any data point
at > 3 sigma as an outlier is probably not a good idea.

Suppose your distribution is approximately gaussian. Because you
have 1912 data points, you should expect about 5 of your data points
to be > 3 sigma, just due to normal variance. It would be shame
to discard those as outliers.

Subject: Removing Outliers

From: ln

Date: 1 Mar, 2005 22:32:24

Message: 13 of 23

Brett Shoelson wrote:
>
> In addition to all of the help you have gotten, you might be
> interested in
> my function deleteoutliers, available at the FEX.
> Cheers,
> Brett
>

   Hi, Brett. Thanks for your help. Sorry about that I already try
your function file, but can't detect any outliers in my data. I dont
know why. But the data really have outliers, because it can see in
the time domain graph that have some points jump over the others. &
If not remove it, it will influence the results in frequency domain.
   ???

ln,
kin

Subject: Removing Outliers

From: ln

Date: 1 Mar, 2005 22:47:35

Message: 14 of 23

the cyclist wrote:
 
> I don't know which help file you read, but identifying any data
> point
> at > 3 sigma as an outlier is probably not a good idea.
>
> Suppose your distribution is approximately gaussian. Because you
> have 1912 data points, you should expect about 5 of your data
> points
> to be > 3 sigma, just due to normal variance. It would be shame
> to discard those as outliers.

   Hi, the cyclist. Thanks for your comment. The help file I read
from:

 <http://www.mathworks.com/access/helpdesk/help/techdoc/math/datafun8.html>

   Is that the method in the help file can't apply for my data that
contain 1912 points? Because the data I taken is in real-time, & have
to plot in directly in time domain & frequency domain graph. So, I
not do any adjustment, just remove the outliers only.
   Is that have any ideas or methods suitable for my case to solve
the problem? Or any function in MATLAB can help it..?

Thanks for all your comment!!
ln,
kin

Subject: Removing Outliers

From: the cyclist

Date: 2 Mar, 2005 09:32:14

Message: 15 of 23

ln wrote:
>
>
> the cyclist wrote:
>
>> I don't know which help file you read, but identifying any data
>> point
>> at > 3 sigma as an outlier is probably not a good idea.
>>
>> Suppose your distribution is approximately gaussian. Because
you
>> have 1912 data points, you should expect about 5 of your data
>> points
>> to be > 3 sigma, just due to normal variance. It would be
> shame
>> to discard those as outliers.
>
> Hi, the cyclist. Thanks for your comment. The help file I read
> from:
>
> <http://www.mathworks.com/access/helpdesk/help/techdoc/math/datafun8.html>
>
> Is that the method in the help file can't apply for my data that
> contain 1912 points? Because the data I taken is in real-time, &
> have
> to plot in directly in time domain & frequency domain graph. So, I
> not do any adjustment, just remove the outliers only.
> Is that have any ideas or methods suitable for my case to solve
> the problem? Or any function in MATLAB can help it..?
>
> Thanks for all your comment!!
> ln,
> kin

Well, an outlier is usually an observation that one believes is
inconsistent with the rest of the data set. For example, if you are
making delicate measurements of noise levels, and someone drops a
wrench during a measurement, then that point is probably an outlier.

If you have no reason to believe that points > 3 sigma are "wrong"
in that way, you should not remove them just because they are far
away from the mean.

Subject: Removing Outliers

From: Brett Shoelson

Date: 2 Mar, 2005 10:23:02

Message: 16 of 23


"ln" <ltskin82@yahoo.com> wrote in message
news:eefc415.11@webx.raydaftYaTP...
> Brett Shoelson wrote:
>>
>> In addition to all of the help you have gotten, you might be
>> interested in
>> my function deleteoutliers, available at the FEX.
>> Cheers,
>> Brett
>>
>
> Hi, Brett. Thanks for your help. Sorry about that I already try
> your function file, but can't detect any outliers in my data. I dont
> know why. But the data really have outliers, because it can see in
> the time domain graph that have some points jump over the others. &
> If not remove it, it will influence the results in frequency domain.
> ???
>
> ln,
> kin

The algorithm for outlier deletion is pretty robust. If you can't
objectively remove them at a given alpha level, then I don't think you
should make the claim that they are definitely outliers. That said, though,
maybe you should increase the default alpha (0.05) in deleteoutliers.m and
re-evaluate your data. At some point, as you continue to increase alpha, you
WILL detect outliers. You'll just have less confidence in your results. Or
something like that.
Regards,
Brett

Subject: Removing Outliers

From: rogerio

Date: 2 Mar, 2005 19:38:33

Message: 17 of 23

I belive you can change the code and store the time that occured a
outlier. so you filter it and in the signal filtered you make other
loop comparing the times saved. So you can insert the time again in
the signal (the time cutted off because outlier) with a y = NaN.
Let me know if you got it.
Other way is using particle filter but if you did not know anything
about it it would be a very bg job to develop a filter to remove it.
Please, if you get something really good to clean outlier, please let
me know.

ln wrote:
>
>
> Hi, i hv a set of data (1912 points in coordinate x,y,z). The data
> hv
> outliers & i remove it using the method that show in matlab help
> file, which if the data > 3 sigma, the data will b remove.
> The question is when i plot the data versus time (in second) :
> time domain, the data which already remove, the time will also b
> remove. So, example if i want to take the coordinate point in 40
> sec,
> the time are not correct already, cos all the time are shift by the
> removing outliers.
> So, does anyone know how to maintain the time observation
> although
> v already remove the outliers? The problem in here is the time will
> shift or change cos b the removing outliers.
>
> Pls help. Thanks!

Subject: Removing Outliers

From: ln

Date: 4 Mar, 2005 09:46:58

Message: 18 of 23

>> the cyclist wrote:
>>
>
> Well, an outlier is usually an observation that one believes is
> inconsistent with the rest of the data set. For example, if you
> are
> making delicate measurements of noise levels, and someone drops a
> wrench during a measurement, then that point is probably an
> outlier.
>
> If you have no reason to believe that points > 3 sigma are
> "wrong"
> in that way, you should not remove them just because they are far
> away from the mean.

    Hi, actually the data is continuous RTK-GPS data. So, can I use
3-sigma standard deviation rule for rejection of the outliers?

ln,
kin

Subject: Removing Outliers

From: the cyclist

Date: 4 Mar, 2005 14:45:10

Message: 19 of 23

ln wrote:

>>> the cyclist wrote:
>>>
>>
>> Well, an outlier is usually an observation that one believes is
>> inconsistent with the rest of the data set. For example, if
you
>> are
>> making delicate measurements of noise levels, and someone drops
a
>> wrench during a measurement, then that point is probably an
>> outlier.
>>
>> If you have no reason to believe that points > 3 sigma are
>> "wrong"
>> in that way, you should not remove them just because they are
far
>> away from the mean.
>
> Hi, actually the data is continuous RTK-GPS data. So, can I use
> 3-sigma standard deviation rule for rejection of the outliers?
>
> ln,
> kin

I think you misunderstood my basic point. Just being > 3 sigma
away is NOT A GOOD CRITERION for labeling an outlier. Being out on
the tail of a (e.g. normal) distribution is NOT the same as being an
outlier. An outlier is a point that is a discrepancy, a point that
was somehow sampled incorrectly, and does not really belong in the
data sample at all.

To say it again in a different way:

If I sample one million points from a normal distribution, and I do
it PERFECTLY, a few thousand of them will be > 3 sigma. But none
of them are outliers, and none of them should be removed.

I am not sure I can be more helpful in your doing it the right way,
but I am trying to help you not do it the wrong way. 8-)

Subject: Removing Outliers

From: Brett Shoelson

Date: 4 Mar, 2005 15:04:50

Message: 20 of 23


"the cyclist" <thecyclist@gmail.com> wrote in message
news:eefc415.17@webx.raydaftYaTP...
> ln wrote:
>
>>>> the cyclist wrote:
>>>>
>>>
>>> Well, an outlier is usually an observation that one believes is
>>> inconsistent with the rest of the data set. For example, if
> you
>>> are
>>> making delicate measurements of noise levels, and someone drops
> a
>>> wrench during a measurement, then that point is probably an
>>> outlier.
>>>
>>> If you have no reason to believe that points > 3 sigma are
>>> "wrong"
>>> in that way, you should not remove them just because they are
> far
>>> away from the mean.
>>
>> Hi, actually the data is continuous RTK-GPS data. So, can I use
>> 3-sigma standard deviation rule for rejection of the outliers?
>>
>> ln,
>> kin
>
> I think you misunderstood my basic point. Just being > 3 sigma
> away is NOT A GOOD CRITERION for labeling an outlier. Being out on
> the tail of a (e.g. normal) distribution is NOT the same as being an
> outlier. An outlier is a point that is a discrepancy, a point that
> was somehow sampled incorrectly, and does not really belong in the
> data sample at all.
>
> To say it again in a different way:
>
> If I sample one million points from a normal distribution, and I do
> it PERFECTLY, a few thousand of them will be > 3 sigma. But none
> of them are outliers, and none of them should be removed.
>
> I am not sure I can be more helpful in your doing it the right way,
> but I am trying to help you not do it the wrong way. 8-)

Yes, but... if your data were really normally distributed, and if you
sampled one million data points, your representation of the sample space
with the data would not necessarily be significantly impaired by discarding
data 3+ SDs above and 3+ SDs below the mean. The notion of "outlier" does
NOT necessarily imply incorrect sampling. For instance, if I made 1000
atomic force microscopy (AFM) measurements of elasticity moduli on the
tectorial membrane (of the inner ear), occasionaly my AFM tip might come to
rest on an errant piece of hard material (bone, for instance). My
measurement at that location is not incorrect, but the value would be
significantly higher than the mean. It would be an outlier, and it would
very likely be more than 3 SDs away from the mean. So discarding it based on
that criterion makes a good bit of sense if I want to measure only soft
tissue.
That said, the Grubb's test (implemented in DELETEOUTLIERS.m) is a
well-established test for detecting and omitting outlying data (from
normally distributed populations) based on the relative values of the
samples, and is highly regarded at NIST. Some references are cited in the
file.
Brett

Subject: Removing Outliers

From: the cyclist

Date: 4 Mar, 2005 15:31:26

Message: 21 of 23

Brett Shoelson wrote:

> "the cyclist" <thecyclist@gmail.com> wrote in message
> news:eefc415.17@webx.raydaftYaTP...

>> outlier. An outlier is a point that is a discrepancy, a point
> that
>> was somehow sampled incorrectly

<snip>

> The notion of
> "outlier" does
> NOT necessarily imply incorrect sampling. For instance, if I made
> 1000
> atomic force microscopy (AFM) measurements of elasticity moduli on
> the
> tectorial membrane (of the inner ear), occasionaly my AFM tip might
> come to
> rest on an errant piece of hard material (bone, for instance). My
> measurement at that location is not incorrect, but the value would
> be
> significantly higher than the mean. It would be an outlier, and it
> would
> very likely be more than 3 SDs away from the mean. So discarding it
> based on
> that criterion makes a good bit of sense if I want to measure only
> soft
> tissue.

I agree with what you say, and I oversimplified the case against
using > 3 sigma. You use it correctly, because you back it up
with some reasoning behind using it as a rejection criterion. (The
original poster did not.)

I won't quibble over the "incorrect sampling" remark, because I think
we both understand the issue. I did not mean to imply that an
inaccurate measurement had been done, but only that there needs to be
a reason to believe that the correct measurement somehow led to a
data point that is not in the statistical ensemble that is hoping to
be measured. (As in your bone example.)

The reasoning in your example is perfect:

-- Do your sampling.
-- Be aware of instances where sample points may not be from the
distribution you want (outliers!).
-- Identify a statistic that correlates well (you expect) with these
outliers. You mention the Grubbs test. There are several others, as
I expect you know.
-- Remove the presumed outliers based on that statistic.

This is to be contrasted with what the original poster is doing (I
fear):

-- "I know some of my points are bad"
-- Remove points with > 3 sigma

Subject: Removing Outliers

From: Brett Shoelson

Date: 4 Mar, 2005 15:46:02

Message: 22 of 23


"the cyclist" <thecyclist@gmail.com> wrote in message
news:eefc415.19@webx.raydaftYaTP...
SNIP>
> This is to be contrasted with what the original poster is doing (I
> fear):
>
> -- "I know some of my points are bad"
> -- Remove points with > 3 sigma

I'm okay with your summary.
Cheers,
Brett

Subject: Removing Outliers

From: ln

Date: 5 Mar, 2005 02:17:49

Message: 23 of 23

Thanks, Brett & the cyclist. I'm so appreciate for yours comment.
Besides that, I have another question is: Is that necessary do the
filtering for FFT spectrum in frequency domain?

ln,
kin

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us