Search This Blog

Monday, October 02, 2023

The Codeface of data mining

Zoe's training continues this week with time in the petrol station. This meant an early start so Darren dropped Ellie with us soon after 07:00. She was in a very chirpy mood and seemed eager to get to school. We walked her to the classroom and then headed home for breakfast. Little was planned for the day.

Diane spent the morning catching up on her admin and making some appointments. I retired to the study and did some research into future cruise options. I was looking for cruises that start and end in Southampton or Portsmouth. There were plenty of options, but not necessarily choices that we would select. There is no hurry. I will return to this another day.

Diane set me a challenge. Can I get the medical data off her iPhone so that she can present relevant measurements to the doctor? Yes, thought I. But how. A little bit of Googling told me that I could not extract specific items, but I could export the whole data set. I experimented with my own data. A couple of gigabytes of data was compressed into a ZIP file and dropped onto my iCloud. Now, what could I do with it?

I went to the iMac - this would need some power. I unzipped the file and found the various subfiles. The one that I needed was nearly 2GB in length and contained all of the raw data records for all of the measurements the Health Data app captures. I scanned through the file until I found the relevant identifier for the data she requested. Now that I could see the structure and knew the identifier, it was a matter of writing a program to select it and dump it into a separate, more manageable file.

It was nice to be back at the codeface. An hour or two later, I had a routine that would power through the many millions of raw data records it would select the ones I wanted and would create a CSV file. This meant that I could easily do further processing using Excel. It was a very satisfying bit of work, but I have already thought of some minor improvements I could make. That will be a project for tomorrow.

No comments: