(All source available here)
Using my setup detailed here I have been collecting accelerometer data off and on for a few weeks. It seems like it is about time to show some very initial results.
To summarize what we have so far, we have a corpus of x, y, and z accelerometer data from the Pebble, collected at a frequency of 10Hz. This data streams from the watch to an Android device to a MongoDB server. I will first talk more about the structure of the data collected and then show some actual plots.
Sensor Data Schema in Mongo
When we save readings to Mongo POSTed from the Android app, we just dump all the readings from the POST directly into the DB with a header that summarizes the time range collected. Mongo is not particularly good at handling time series data. Storing every single reading as its own document would result in way too many documents to index. An alternative scheme would be to keep one document per hour/day and keep updating that period document with new readings that would fit in that period. This has the unfortunate effect of bloating the oplog, since the idempotency constraint requires the whole document to be rewritten on every reading added.
So since updating or breaking up blocks of readings incurs some overhead, we just shove all the readings into the database as is and then do some sorting and organization when the data comes out. An example reading block with headers looks like this:
{ "start_ts" : 1122334455,
"end_ts" : 1122335566,
"readings" : [
{ "x" : 1000,
"y" : -10,
"z" : 50,
"ts" : 1122334455,
"v" : false },
...
{ "x" : 800,
"y" : -20,
"z" : 100,
"ts" : 1122335566,
"v" : false }
]
}
The catch here is that blocks of readings that come out of PebbleKit are not ordered, so we generally need to load any blocks that would intersect the time period of interest, sort the readings, then throw out the readings that were not in the range:
db = pymongo.MongoClient().watch_data
readings = []
for reading_block in db.accel_data.find({'start_ts': {'$lte': end},
'end_ts': {'$gte': start}},
sort=[('start_ts', pymongo.ASCENDING)]):
readings += [r for r in reading_block['readings'] if r['ts'] >= start and r['ts'] < end]
readings.sort(key=lambda r: r['ts'])
So now we have a list of readings dicts. Time to put it into a pandas dataframe for some plotting.
Using Pandas to Process Time Series
We can now load the readings directly into a dataframe:
df = pandas.DataFrame(readings)
And set the index to use actual times instead of a monotonic counter:
idx = pandas.to_datetime(df['ts'] / 1000, unit='s')
df = df.set_index(idx)
We can then go ahead and plot a line per accelerometer axis in gs:
df[['x', 'y', 'z']].plot()
Here is an example plot of one night of sleeping:
The flat lines are periods without movement, then when I move in my sleep the relative readings of the 3 axes change. Notice how the flat periods lengthen as the night goes on. Presumably this is because I am sleeping more deeply.
To make a very rough approximation, one would think the less movement during sleep, the more restful it is. To factor out the constant force of gravity and just look at movement of the watch, we take the absolute value of the time-differenced time series and then add that up:
df[['x', 'y', 'z']].diff().abs().cumsum().plot()
And we end up with accumulated activity over time:
You can see “background”, low-amplitude movement contributes quite a bit to the linear slope and the short bursts of more movement bump the sum up periodically. We will investigate this affect in more detail later. Another interesting point is that from here there are a lot more z-axis differences than x and y. Again, this is data to look more closely at later.
Another interesting question is whether we are really capturing all the 10Hz accelerometer data from the watch, to the phone, to the database on the server. We can visualize this by just time-differencing the timestamp column:
df[['ts']].diff().plot()
So for this time of interest, it looks like there was just one period where the inter-reading gap was about 7 seconds, the rest of the time it was significantly below 1 second. We would expect the gap between 10Hz readings to be 100ms, and this looks about right.
So that wraps up a first look at the data collected from my pipeline. Next time we will look at more time periods of data. See what metrics we can develop to evaluate the effectiveness of sleep. Maybe even try to do some classification on different types of movement.