My Thinking On Various Topics

Collecting Data I Do Not Yet Know How To Use

If we ask questions – as Socrates warns us we must – eventually our ability to answer them becomes limited by the data we have available. If we wait until we have the question to start collecting data, it will take us some amount of time to get baseline data and then even more to measure change during experiments. We won’t have our answer for some time – weeks, months or even decades. We might never get an accurate answer.

This is why we must record data that we do not yet know how we will use. Some day we will have a question that these data can answer. This is why we must bear the burden of recording and storing information. Some day an important question will be quickly answered because of the hard work we put in now.

Here is a story to highlight one recent, though not all that important, example in which I experienced this.


In 2015 I started tracking how frequently I participate in various hobby activities as a way of measuring the enjoyability and balance of the life I have created, and am creating for myself.

It is more precise, and thus less overwhelming, for me to make the statement “I feel happiest when I surf at least 12 times a year but I’ve only surfed 8 so far” than to deal with some ambiguous emotional statement about “not felling like I surf enough anymore”. The former is actionable – creating the opportunity for four surfing sessions is fully possible by booking a week vacation on whichever coast/island is getting waves in the next month. Problem solved.

The decision I periodically consider is whether 12 times a year is really cutting it and if not, what would be required to increase that number. The two biggest things I hypothesize are limiting my frequency of surfing are accessibility and other priorities – mainly the fact that I have a job and three kids.

I am easily able to test how limiting other priorities are by looking at windows I have now for other hobbies that could in theory be substituted.

The former is more challenging to test – how much does accessibility really impact frequency?

How much did I really surf when I lived two blocks from the beach & how does it compare to my target of 12 times per year? Was it really every day, 365 times per year, like I sometimes hyperbolically state? Or was it actually more like 2-3 times a week – for 100-150 total surfing sessions per year? Or was it really 2-3 times per week during the good half of the year and very few during the bad part for a total closer to 50?

Would greater accessibility affect things by a factor of two, or an order of magnitude?

How can I answer that question? I could try to recall how much we used to surf – and how that changed as I moved further away. But my memory isn’t precise and is biased to remember things I enjoy. I am frequently reminded the fallibility of recall. I could try to track data on the surfing population and compare frequency to accessibility – but would laborious and I wouldn’t have my answer for months.

Fortunately, I solved the problem for myself a long time ago. I logged data that is useful here.

I’ve been tracking things about myself for over a decade. The largest set is from a longitudinal project in which I track every minute of my time during a sample week, once per quarter. These data go back to when I was still in college, which means it captures a wide range of life – from student, to gap year, to young professional – from bachelor, to husband, to father – from Pittsburgh, to Newport Beach. to San Francisco.

I took a look at two sample weeks during my Newport Beach years when I was a two minute walk from checking the conditions. I surfed on six days of the week for a total of nine hours during the first week and four days for a total of four hours the second week. By averaging those I ended up at five days for 6.5 hours. Assuming those weeks are an accurate sample (which I trust with decent certainty due to my methodology) that would put me in the 200-250 a year range. 15 – 20 times more than my current target.

Now some of that has to due with difference in lifestyle – but I can normalize for that by looking at how frequently I run now compared to periods closer to those sample weeks. That math accounts for a factor of two to three.

So we are left with the conclusion that moving into a house two blocks from the beach would increase the amount I surf on a yearly basis by a factor of 5 – 10. Not insignificant.

This ultimately leaves me with value questions. How important is surfing to me compared to other hobbies? How important are hobbies to me compared to other priorities?

But what I am not left with is ambiguity about the effect of the change – which means I can focus on those important questions and approach them from a solid base of facts rather than emotion.


Thanks to data I recorded previously for one purpose, I was able to quickly answer a new question with relative accuracy.

That is why we track things, even things we aren’t quite sure how we will use. At some point in the future we might be better equipped to use them. Tools, methodologies and questions that arise in the future are what will give value to our task of recording data today.

As I looked back at the data I had so preciously recorded and saved, my only regret is that I do not have more. More frequent samples, more details & more types of things recorded. This is what gives me the drive to track all the things I track now – of which the list is growing.