Hacker Newsnew | past | comments | ask | show | jobs | submit | gdpq11's commentslogin

Exactly


It definitely can be (I believe one of the academic papers covers that), but we haven't implemented anything yet. But it's definitely on the to-do list


Yeah, this is actually a good example of why it's important to add a bit to the raw Matrix Profile. The point is anomalous with respect to the pattern preceding it (the "sawtooth"), so in this case one needs to consider the whole Matrix Profile. It's a good callout in that the graph isn't a complete anomaly detection system; it more demonstrates how a single anomalous point can impact the Matrix Profile value.


Great, thanks for the context!


The nice thing about how the Matrix Profile is built is that you can slice up different regions of time to focus on your use case. To build the MP you start with an NxN matrix that lists the distance between every point (or technically N-m+1 x N-m+1), then find the overall closest distance for each point. However, we've found that first "updating" the NxN matrix allows you to do analyses like your two anomaly example.

In that case, you'd create a parameter "w" that specifies the boundary between when two matching points are a pattern, or if enough time has elapsed so that they should be considered two anomalies. In the NxN matrix, for the ith row you'd then set every value outside the i+w/i-w boundary to infinity. In that way, the resulting Matrix Profile would account for your situation.

Due to the algorithm's speed we do often sweep over multiple values, but try to use domain knowledge where we can. And for alerting, we sometimes have labeled data that we can calibrate the threshold to, but often times that's a matter of customer trial and error.


You say that the algorithm is fast, and the literature certainly points to this too but I tried the python implementation (linked here) on some audio data sets. 1 second of audio at a reasonable quality is 44.1k data points and it was taking minutes to process this data.

I tried an R implementation which was multi-threaded and a lot faster, but still the algorithm took ages to test lots of different window sizes and data sets.


That's odd, we've definitely processed larger datasets much quicker than that. Feel free to raise an issue on Github and we can take a look.


Just to advance my previous comment; right now the most interesting feature to me is the motif detection, however, the motif's are always incredibly short (or a fixed size?). Please excuse me ignorance in this regard, I'm not too versed in the algorithm. Is there any use cases where you have looked at longer motifs?


There's some interesting work around multi-month seasonality that covers what you're talking about, but Eamonn Keogh would probably have a better answer that I :). Also, in regards to your performance issue, did you use the pip version or the code directly from Github?


I used the pip version.


Cool. Thanks for responding. I'll provide some example code and my specific use case.



Definitely! We see GPUs as the primary method of calculating the Matrix Profile given that Big Data is only getting bigger. So, advancing from STOMP-GPU to SCAMP will be very helpful.


Shoot, sorry about that! Definitely a typo.


The funny thing is that I wanted to know more about the project after seeing this page. I'm not sure I would have continued if it was just another time series library.


Also, it seems you've mispelled Eamonn Keogh's name and got 'Keough' instead.


Thanks for the heads up! We've made the correction.


No problem! We've updated the link.


I'm not affiliated with UCR, though I am a product of the UC system :)

I agree with Keogh that Matrix Profile can help solve a very wide range of problems, but you usually have to go a little bit deeper than just calculating the topline Matrix Profile. A good example of this is that if you calculate the Matrix Profile for something with daily seasonality (say, in-store retail sales), you'll see the same daily pattern in the Matrix Profile. The straightforward fix for this is to normalize by time window (say, only compare the Matrix Profile at the same time each day).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: