GroveStreams

Duplicates or near duplicate data386

pat private msg quote post Address this user
Hi,

I am just starting to use GS and looking at the best way to feed it.

1. How would GS handle duplicate data (exactly the same data and time coming from the same sensor from different gateways)?

2. How would near duplicates (same date but time + or - 2 minutes) be handled in GS, ideally I would like to keep only one data in a time range of 4 minutes for a sensor having exactly the same payload ?

3. If all my data is uploaded real time in amazon rds (postgres), what would be the best way to push my data to GS ? Using this postgres database I can avoid all those duplicates and time range problem (point 1. and 2.) and have a data backup

Thank you for your help.

JV
Post 1 IP   flag post
MikeMills private msg quote post Address this user
1. Last call wins. Data is organized this way via the keys used in the Feed PUT api call:
* Organization (determined via the secret API key)
* Component ID
* Stream ID
* Sample Time (to the millisecond)

If all of those keys are the same for different gateway calls, then the last call overwrites the first calls data. No corruption will occur. We lock everything at the component level during each call and other calls will wait until the first lock is released.

2. If you define your stream as a 'Regular' stream, then all samples with unique times to the millisecond are saved.

If you define your stream as an 'Interval' stream with a cycle of 4 minutes, then the last sample to arrive within that cycle span will only be saved. An interval (defined by a cycle) has a start date, an end date and can only hold one value.

As you learn GS, you'll realize that a 'Regular' stream has many virtual 'Interval' streams so you could use a 'Regular' stream and then just use its '4 minute' cycle for displaying, graphing, alerting and such. If more than one sample exists per cycle, then your default rollup method will be used to determine how to combine all of those samples into one interval (sum, min, max, first, last, avg)

3. Not familiar with Amazon RDS, but I would guess there might be two options:
a) If Amazon has a export to file option, do that and then import the file via the GS Import Profile logic. You may have to transform the file to match our CSV format
b) If Amazon has an API, then use their API to access the data and then pass the data into your GS organization via the GS API.
Post 2 IP   flag post
2965 2 2
Log in or sign up to compose a reply.