Should I take aspirin?

Earlier this year, I purchased Dante Labs “whole genome Z” service, which includes 30x sequencing of every base pair of my DNA, plus an additional 100x on the protein coding regions.

I mostly did this for the raw data. I work in this space and I like to tinker. Using my very own data shields me from any concerns about privacy, consent, and appropriate usage. It’s also super useful professionally: I’m an advisor to folks who are responsible for health and genetic data from hundreds of thousands of patients and research participants. I find that handling my very own information has a way of clarifying my thinking around privacy, consent, and other topics related to good data stewardship.

My experience thus far with personalized genomics is that there’s not a huge amount of diagnostic or clinical value there unless you’re dealing with cancer, the risk of inherited conditions, or a challenging undiagnosed disease. I’m in my 40’s now. I would already be aware if I carried any of the most readily diagnosed genetic disorders.

The joke is that 23andme told me that I’m probably male, most likely of northern European ancestry, sorta average height, probably brown eyes, likely brown hair … you know … all things you could tell at a glance by looking at me.


My expectations were low when Dante Labs sent a note inviting me to check out their new “wellness and lifestyle” report. I was 100% surprised to see that the first item on the list was a “high risk” for “Aspirin.” That’s new for me, and I was sort of hoping that the new data had unearthed some heretofore un-observed risk factor.

Spoiler alert: It had not.

I clicked in to see details, and got this rather opaque wall of generated text, which had obviously never been edited by a human. Maybe that’s what they meant when they claimed to be revolutionary in their use of artificial intelligence in these reports.

I didn’t know the word “urticaria,” so I googled it. It’s hives: red, raised, bumpy, itchy skin. Millions of people have it, it’s irritating, completely self diagnosable, and eminently self treatable.

I got curious. I take daily low dose aspirin because I’ve read about a constellation of positive effects. The question is, should I stop?

The really simple clarifying question would have been “do you break out in hives when you take aspirin?” The answer to that is “no,” but bear with me, I’m telling a story here.

Nerding out on genetics

The first question with any kind of genetic diagnosis is whether the data is correct. Fortunately, I’ve been a genomics fanboy for a while, and I was able to crack open my raw data from 23andme. Yes indeed, at position 179,220,638 on chromosome 5 I am heterozygous – with an “A” on one of the copies and a “C” on the other.

> grep rs730012 genome_Christopher_Dwan_v1_v2_v3_Full_20170926071925.txt  
> rs730012 .  5 . 179220638 . AC

After verifying data quality, the next question is “how sure are we about this?” There is a lot of truly tenuous associative research out there, and a naive approach is almost certain to lead you astray.

I took a look at ClinVar, a remarkably powerful and well curated database of the clinically actionable variants. It said that yes indeed, there is an association between this variant and an allergic reaction to aspirin.

I skimmed the abstracts of the three publications, and while it’s a clear association, it’s not the strongest of signals. The three studies were pretty small, with case and control groups of around 100 people each. Importantly, all three studies asked the question “is this genetic variant more common in people who break out when they take aspirin,” rather than asking the deeper and much more challenging question of -why- such people might have such a reaction.

Short version: It turns out that the reaction to aspirin is more common among people with a “C” at that position at either or both copies of your chromosome 5. In industry parlance, I’ve got one copy of the “risk variant.”

One really important question when looking at this sort of thing is to determine how rare this genetic variant is. My friends at SNPEdia have done a great job of parsing a bunch of different resources to show the answer. In this case, the answer is that among caucasians, my genotype is actually the most common type. It’s pretty rare in other populations, but for white folks like me – most of us have either one or two copies of the risk variant.

So what you have here is a super common genotype that’s associated with a minor, self diagnosable and self-treatable condition.

So should I stop with my daily aspirin? The answer is probably not.

Other genes, other diseases

SNPedia is my go-to for quick reads on genes and variants. I did a little poking around on aspirin and found a ton of interesting stuff. As just a single example, we’ve got Rs6983267 over on Chromosome 8.

Don’t look at me like that. All interesting people have at least one odd hobby where the nerd-o-meter reads “extreme.” This is one of mine.

There’s a study of more than 3,000 caucasians with the exact kind of cancer that killed my grandfather, and I have the risk variant (‘GT’) here too. The middle red box on the right, next to ‘GT’ says “aspirin reduces the risk of colorectal cancer.”

Sadly, this one didn’t make the cut for Dante Labs. According to them, I’m 100% free of colon cancer markers.

So what’s the point?

The point is this: This stuff is complicated and it is important. I’ve written before about how the risk averse culture in American medicine holds us back. This is a counterexample. A naive person might have looked at that report and said “oh hey, I’ll stop taking aspirin, I’ve got a risk factor.” The simple fact is that the risk is for a minor, eminently detectable condition, and there’s good data to suggest that taking aspirin (specifically, for me) reduces my risk of dying painfully of a kind of cancer that runs in my family.

I don’t want the FDA to shut Dante Labs down, but I do want Dante to get their act together and stop just yammering about “AI.”

A side note

In the course of writing and editing this, I have noticed a confounding factor.

Over the past couple of years, I -have- in fact noticed a couple of reddish patches on my torso. I’ve treated them with antifungals, but it didn’t have an effect. They don’t itch and they aren’t terribly visible, so I don’t worry about them.

So, just now, here at the end, I’m thinking that I might cut out the aspirin for a month and see if those patches fade. In that case, I will have learned something. After that, I will 100% resume the aspirin, because duh.

Time To Have An Idea

What are the most important pieces of professional advice you’ve ever received?

I remember one of mine clearly: It was in late 2004, and my colleague Bill told me that it was “time to have an idea.”

I had hired in as the first employee at a small consulting company in early summer. The founders had been handing me pre-specified projects for a few months. These early projects appeared on my desk ready-made, with the Statement Of Work (SOW) already written, the scope negotiated, and the customer interested mostly in when the resource (me) could be scheduled.

Now it was fall, and it was time for me step up my game and spec my own work. I realize now that they were tired of carrying me.

In the spirit of “learn by doing,” they dumped me on the phone with a prospective customer, the IT department for Stanford.

That, in itself, was an incredible opportunity.

Rookies look down on “sales.” I know now about the grinding work that leads to calls like that. The series of interactions with gatekeepers whose only options are to say “no” or else to continue the conversation. The people on the other end of this call could say “yes.”

Also, their “no” would end the conversation entirely.

At the time, I wasn’t even savvy enough to be nervous.

I know now that we practiced a variant of “spin” selling, which focuses on understanding the customer’s pain points as the first part of the conversation. It’s not “our floor cleaning machine is great,” but rather “do you have any irritation connected with dirty floors?” Our model was characterized by a triangle of needs, features, and benefits. If your offer (the features) addresses the customer’s needs, and if the benefits to them (the perceived value) are greater than the cost, the deal pretty much closes itself.

I was prepped with the need: Stanford had recently done an audit and determined that they employed more people in computer support roles outside of IT than within it. Further, they had found at least 20 instances of an on-campus closet with a ton or two of recently added cooling to support a feral compute environment.

IT needed to justify their continued investment in scientific computing. The user community was routing around them.

The conversation went back and forth for about 20 minutes, introducing ourselves, re-hashing the situation, doing the human part of the meeting. Somewhere around that 20 minute mark, Bill, my colleague / boss / and co-owner of the company popped into the group chat:

Time to have an idea, Dwan.

I was stumped. What did he mean?

Conversation continued, my teammates carrying me. Bill pinged again.

Dwan, write yourself a job.

So I went for it. Broke into the conversation and suggested that maybe it would help to have me … um … fly to California to spend a week with them? Yes. Having me onsite was totally part of it.

They were curious but unconvinced. What did I have in mind?

Maybe the need was that folks on campus were unaware of the resources available within central IT. So I would come out and give a series of talks on batch computing and how scientists might use the central IT compute cluster (the feature!). That would draw prospective users to the resources of central IT (the benefit!).

They dug it. There was a brief digression to fill in the details.

Bill texted again:

Keep going. There’s more. Go for it. You got this.

So I kept going. I suggested that I would also talk to the various user / stakeholders and ask them what they needed. With prompting from Bill, this turned into an offer to author a report describing the “capability gaps” between central IT’s offerings and the needs of the community. We would use my talks as bait to draw an audience with legitimate value, and leverage those connections to help central IT better align its services against its stakeholder needs.

Sorry for the consultant-speak. It’s what I do for a living.

On that call, it was enough. We got the work. I still sort of marvel that my words on that phone call created a trip to California.

As a mentor and friend would say about a different project, a decade later: “You spoke it into being.”

Knowing what I know now, I should have gone further. I could have helped more. My proposal was tactical rather than strategic. I should have offered to help with the root cause rather than just going after the symptoms. There should have been check-in and follow-up to make sure that I didn’t just drop a consultant report and leave, but instead fixed the problem for good.

How, exactly? Well that depends on a lot of other questions.

Did you have a “have an idea” moment?

If you’re further along in the career journey, can you give such a moment to a person on your team?

N of one

We are living through an uncomfortable period in the practice of medicine.

The dialogue between patient and physician is critically underserved, both in terms of tools for patients and physicians, and also in terms of the data context where that conversation takes place. This is unfortunate, because those are the moments of human to human care. Whether it’s a clinic visit, a lab test, a counseling or physical therapy session, the patient / provider meeting is when the full breadth of the caregiver’s experience and training can be brought to bear. At these moments, the subtle observations and pattern recognition that constitute diagnostic expertise come into play. These are are also the times when the nuance and detail of the patient’s lived experience can be shared to influence the course of diagnosis and treatment.

Population health turns into personal medicine at the bedside.

That conversation between patient and physician ought to be a first class citizen in terms of tool development, but it is not. It is within our reach to build a clinical care environment that retains high standards of data integrity and privacy while also focusing on empowering the human beings in the room rather the interests outside the door.

Due to the misaligned incentives that I’ve written about previously, the development of tools to support a data-rich conversation between patient and physician has generally taken a back-seat to software for billing, regulatory compliance, and mitigating risks to the care system. Recently, we have begun to instrument the clinic to support data gathering for research purposes. While this is a great idea on the face of it, it can have the unintended effect of leaving still less time for that critical conversation. Unless we can close the loop and bring the benefit of that instrumentation back to either physician or patient, it will be felt as friction, yet another loss.

I believe that we can have our data and do research on it too – and also that the clinical interaction is vastly more important than research use of the data we might gather along the way.

Research at no benefit to the participant

On the topic of research.

I’ve participated in a number of clinical research projects, mostly around genetics and genomics. The usual routine is to sit in a plastic chair and fill out a piece of paper using a pen tied to a clipboard. Some projects let me do the (still manual) data entry using a tablet. I used to gripe to the staff that this is a terrible, terrible way to gather data, but these days I just let them do their job and then blog or tweet about it later. The moment of truth comes with a needle stick, a swab, or a collection cup. Sometimes there are juice and cookies. Usually not.

Later, some anonymous lab will re-measure values that I’ve likely already got on my laptop. The math is churned for a few months, and perhaps somebody publishes. I usually won’t find out. I’ve stopped asking about that, because I’m bored with people who use HIPAA as an incantation to ward off further questions.

There are notable exceptions to this pattern of research’s stony indifference to the well being of the participants. The Coriell Personalized Medicine Collaborative stays in touch, nearly a decade after I spat in a tube for them. I get regular emails sharing the research results derived from my data. They also provide a crufty-but-effective web interface to allow me see curated and IRB approved subsets of my results along with risk scores and background reading. For all the well-deserved flak we give (and should continue to give) 23andme for selling our data to the highest pharmaceutical bidder without asking first – they too give me useful and regular value.

All of Us is saying the right words about citizen researchers and “partners rather than subjects,” but the proof will be in the pudding. Their involvement with the likes of Google leaves me a bit cold.

In nearly two decades of energetically engaged participation, I have yet to encounter even one research project that offered to close the loop on the data they collected by making it available to my physician in the context of my clinical care. Nearly two decades after we completed the Human Genome Project, this basic courtesy to research participants is still not on the menu.

We are left to fend for ourselves, to separate the useful offerings from the snake oil in the direct to consumer marketplace.

Personal Data

I’ve written, more than once, about my ongoing attempts to get out in front of the curve of personalized / precision medicine. I can see where we’re going, and I want to live there as soon as possible. Early 21st century medicine is, by and large, reactive. Nobody wants to hear, “I wish we had caught this earlier,” but that’s what you get when the protocol is to wait for visible symptoms before testing for disease. Risk officers exacerbate this by steering physicians away from data, citing the risk of incidental findings and HIPAA violations.

I’m still irked about the physician who tried to refuse to screen me for the colorectal cancer that killed my grandfather, despite genetic and symptomatic evidence that indicated that it might be worth an extra look.

In the future, patients will have conversations about their care in the context of a well structured repository of personal data. That data will come from multiple sources, most of them nonclinical. Our data will be available, with appropriate localization for education and language, directly to the patient. We will be able to share it with our in-home caregivers and with a care team that includes both physicians and other health and wellness professionals.

In the future, nobody will ask for my previous doctor’s FAX number.

Put another way, our physicians should have the same data-driven advantages that we already see in retail sales, in entertainment, and in finance. Our doctors should have the kind of integrated data that data monopolies like Google, Amazon, and Apple already use to influence everything from our buying to our voting.

Of course, that will require changes to – without exaggeration – nearly every aspect of the clinical data environment. We should start now if we want to see it in our lifetimes.

Mercury Retrograde

A company named Arivale has been a partner in my personal data journey for the last year. Through them, I could get clinical-grade laboratory bloodwork every six months. The Arivale dashboard showed me my data in context, along with information from my self-monitoring devices (pulse, weight, sleep, and steps per day), as well as notes from online self-evaluations and conversations with a “wellness coach.”

We were a year in, and it was just getting good when they shut down. They cited operational costs, implying that this sort of service is too expensive to provide – at just about any price. I wish I could see the math on that.

I have written before about my elevated mercury levels and how I was able to do a personal experiment to see whether changing my diet to omit fish rich in heavy metals would reduce them. Here’s a full year plot of the data. It worked.

Of course, over the same year, my cholesterol shot up. Here’s a graph of my LDL levels and particle count over the same period:

My first reaction to these plots was to ask “what changed?” One obvious thing that changed was my diet. I had mostly stopped eating mammals and birds around the year 2001. When I cut out mercury rich fish, I re-introduced a bit of red meat. On reflection, I was probably looking to replace the celebration meal-centerpieces that had formerly involved high-on-the-food-chain fish. Also, a slow-cooker roast on a Sunday is pretty wonderful.

The experiment over the next six months will be to dial back down on the red meat and see what happens to the cholesterol. My other grandfather died of heart disease. It’s something I keep an eye on.


When I showed these plots to an experienced computational biologist whose PhD includes the word “statistics,” she had a strong reaction. To paraphrase: “What are they thinking, drawing straight lines between those points? That’s incredibly misleading. You got tested three times in a year. Three. This plot gives no insight into the underlying biological variability or the accuracy of the test! This is a gross oversimplification!”

I tried to make a case that the simple picture was accessible enough to spark curiosity and bring a novice like me into a data driven conversation. I told a story about different visualizations that would be suitable for everybody, including patients, data scientists, and also clinicians, all rendered based the same underlying data. She was unimpressed: “It doesn’t matter which of those categories of person we’re talking about, this plot would be misleading to all of them.”

I trust my statistician friend, and I can see the importance of making sure that the data presentation is as accurate as possible. I’m bummed out that I didn’t get to write the feature-request note to Arivale.

The clinic of the future

I will end on a hopeful note: I recently had the opportunity to visit a clinic from the future.

When you walk into Lab100 at the Mt Sinai School of Medicine, it feels more like an Apple store than a medical establishment. Everything is smooth curves, laminate, and frosted glass. Even though the data that they gather is more accurate, better calibrated, and more natively digital (no manual data entry here). The experience is also more personal and human than I’ve previously experienced in a clinical context.

You know how the restaurants and vendors at Disney resorts already know your preferences before you speak up? Imagine that but at the doctor’s office or in the hospital.

A visit to Lab100 begins by sitting down with your caregiver, side by side on a couch. You and the clinician talk while looking at the same pane of glass, a large flat-panel display that shows your medical history and current complaints. Instead of being separated by technology – the flat panel monitor between me and my doctor – here technology brings people together to facilitate that all-important doctor / patient conversation.

The beginning of the visit is a review of your chart to make sure that it’s accurate, complete, and relevant. You move through stations to measure blood chemistry, balance, cognitive function, grip strength, and more. At each station there are video presentations explaining what is being done and why. Your results show up on the screen immediately, including a longitudinal view of how you tested before.

At the end, there is another sofa and an even larger screen where you see yourself in context. Your data is shown along with a cohort of other real people, matched to you by gender and age. Then you and the provider talk and make a plan together.

It’s compelling. I hope that the idea takes off.

It felt like rich people medicine, but the founders of the lab assured me that it is built out of commodity components and designed to be replicated without undue expense. In 2019, the Apple aesthetic is certainly high-end, but for all that, there is an Apple store in every major city in the country. It is apparently possible to have that rich-people feeling while still keeping the coss to shopping mall levels – provided long as you’re selling consumer electronics and not health care.

Lab100 and whoever follows in Arivale’s footsteps are not the whole picture. There is a lot of work still to be done, and many entrenched interests to be appeased. We’ve spent decades building and empire tuned for billing, risk mitigation, compliance, and a weird and stilted flavor of data privacy. It’s going to take years to dig out of this hole.

For all that, the path is clear: Radically empower patients with access and control over their data, and make the physician/patient conversation a 1st class citizen in terms of tool development.

Let’s get on with it.

That consulting thing

People regularly ask, “how’s that consulting thing going?” It’s a fair question, and I don’t mind answering. The short answer is that it’s going better than I ever expected.

Conditions were basically perfect when I created my LLC in 2013: I had been employed by BioTeam for nine years. Since 2011, I had been dedicated nearly full time to a single customer, the NY Genome Center. The work with the genome center was all-consuming, so Bioteam had transitioned my day to day management responsibilities to other members of the team. That made it minimally disruptive to ease myself out and “go direct” with the Genome Center.

About a year later, NYGC was to the point where it didn’t make sense for them to rely on consultants anymore. I have great respect and love for the team and the mission, but I didn’t want to live in Manhattan. I came back to Boston and hired on as the leader of research computing at the Broad Institute.

During that first round of independence, I didn’t give much thought at all to business development or process. I had NYGC to rely on, and a few other small gigs sort of landed in my lap along the way.

Fast forward to March of 2017. I decided to depart the Broad and give the “independent” thing another go. It was a very different situation. Without that single large “anchor” customer in hand, business development was essential. I started blogging (yes, this blog is a business development activity), meeting friends and colleagues, tweeting more actively, and generally hustling to raise my profile and build a client base.

It worked.

Two years in, I’ve closed deals with twenty different companies: Seven biotechs, four technology vendors, three other consulting groups (mostly subcontracting for specialized skills and expertise), two universities, a pharmaceutical company, a regional hospital system, a government agency, and an independent research institute. Two of my clients are coming up on their two year anniversary of working with me. Eight others were brief “one and done” engagements.

It’s going well enough that I’ve had to deal with some of the challenges of success.

There’s a fair amount of road time. I’m platinum status with Marriott, “select executive” on Amtrak, and Mosaic with Jetblue. It’s frankly disheartening that, in terms of lifetime totals, I’ve spent nearly two full years worth of nights sleeping in hotels. On the other hand, I benefit from the ongoing biotech miracle that is Kendall Square: Nine of my clients are within an easy bicycle ride from my house.

Managing travel time is among the most important things that I do for my health, happiness, and profitability. It turns out to be straightforward for me to book myself into travel hell, which certainly -feels- like being productive. However, for me at least, that productivity is an illusion. Looking at the numbers, the months when I was running myself ragged going back and forth across the country were actually among my -least- profitable, especially factoring in the downtime that I need to recover from even a few weeks of being flat-out on the road.

The basics of communication and scheduling also take discipline. Slack is ubiquitous among my clients, which means that I check something like six different workspaces on a daily basis. My life would be utter insecure chaos without a password manager to manage logins and secrets. I practice vigorous defensive calendaring to ensure that my days don’t wind up chopped into useless shards of time and to make space for life maintenance activities. Along the way, I’ve disabled all but the most essential alerts on my desktop and mobile devices. I’ve replaced an interrupt-driven way of life (which actually just doesn’t work at scale) with norms and boundaries that allow people get my attention without having to be online and interrupted all the time.

Independence was scary at first, both from a financial and from a lifestyle perspective. It certainly doesn’t work for everybody, and I’m cognizant of the luck and privilege that make it possible for me to live this way. I still have regular bouts of imposter syndrome where I realize that I cannot possibly be getting away with this.

As always, huge thanks to the community of colleagues, friends, and customers who make it all possible. And now, back to work!

25 Rules For Sons

A few days ago, one of my professional contacts shared a list titled “rules for sons” on LinkedIn. It was filled with advice like, “the man at a BBQ grill is the closest thing to a king,” and “carry two handkerchiefs. The one in your back pocket is for you. The one in your breast pocket is for her.”

Lists like this are always making the rounds. This one may have started with a 2015 book titled “Rules for my Unborn Son.” There are other versions online, but they’re all the same story. Manhood is about wearing sport coats, working the grill, asking the pretty girl out, marrying the woman, playing team sports, and maybe serving in the military.

I scrolled past, but found that it was still bugging me after a couple of minutes, so I went back and left a two word comment: “Misogynist claptrap.”

He (you knew it was a guy who shared the post, right?) responded almost immediately that I had clearly not read rule 23: “After writing an angry email, read it carefully. Then delete it.”

LOL, right?

I severed the LinkedIn connection. No harm, no foul – but I don’t go to LinkedIn looking for irritation, and arguing in the comments section has never, even once, changed anybody’s mind.

I shared the story with my spouse, and she said, “You should tell him, and you should tell his employer too. Those people scare me. They can’t hurt people like you, but they can and do hurt people like me.”

So in the spirit of “hey bro, not cool,” here’s the deal:

Truth in Advertising

My contact is the regional sales lead for a new company. His job is to open doors, get meetings, develop relationships, and eventually to make sales.

For a person in that position, LinkedIn is a marketing tool. This guy is an experienced professional. He knows what he’s doing here, at least at an unconscious level. His list – like this blog post – is signaling to his community about what kind of a person he is and what he expects of the rest of us.

No matter the title, this is not about any notional “sons.” Instead, this is how he expects the men in his professional network to act.

The message is that my contact is a certain kind of businessman. He has a firm handshake, looks you in the eye, and is an experienced negotiator. You know he’ll close the deal and then you can both go home to your wives and kids.

Under the hood, though, the inverse message is also clear: We’re supposed to think less of men who don’t make strong eye contact, who wear nontraditional clothing, who (for whatever reason) don’t marry the girl or work the grill. Those people aren’t up to this guy’s standards.

He also keeps a clean hankie on hand in case one of the ladies is overcome with emotion. Good dude, right?

So What’s The problem?

Lists like this rise from a nostalgia for a time when gender and relationship roles were supposedly simpler. Men were men, women were women, and there was a well defined and correct way to fill either role.

Of course, those roles were radically asymmetric when it came to the workplace. Women were (and are) paid less, under-promoted, subject to outright abuse and subtle neglect, and generally treated like second class human beings. We’re going to be grappling with the fallout of those antique and chauvinist ideas for the rest of our lives.

Even worse: The idea that there is a single correct way to experience gender is incredibly toxic. Our society is slowly and haltingly coming to grips with the diversity of human experience – and lists like this, while superficially innocuous, are a step backwards.

Things weren’t actually simpler back then. Rather, people who didn’t fit into the dominant patterns either adopted an ill-fitting persona at great emotional and mental cost, or else they were excluded, ostracized, and subject to violence and even questionable medical procedures aimed at correcting them because they were somehow wrong at being themselves.

The problem with pushing this as some kind of misty eyed ideal in a professional / business context like LinkedIn should be apparent on the face of it.

The Inappropriate Thing

The thoughtful reader might go back, look at the list, and say that this blog post is a bit of an uptight overreaction. There is no particular word or phrase that stands out as inappropriately crossing some clear line. That’s how this sort of signaling works. The inappropriate stuff emerges gradually as we establish some spaces (the grill, the locker room, perhaps the board room or the industry event) as masculine and therefore subject to different rules.

This is the gateway to some really nasty stuff. Once we start down this road, we’re just a fraternity induction and an MBA away from the @GSElevator twitter feed.

More on that at the end of the post, but first allow me to share another example:

The All Male Conference

A few years back, I was invited to speak to a meeting of the US sales and engineering teams of an early stage technology company. I was already a customer, and my team was in the middle of a proof of concept evaluation of their new product.

When I arrived, I was struck by the massive gender imbalance. It was an all male event with at least 50 men in attendance. The two women at the conference center were the receptionist who gave me my badge, and the person who served the coffee.

The thing had a weird and macho vibe: When the national sales lead finished his presentation, his last slide was a picture of some bird, perhaps a duck, that he had run over on the way to the meeting. The room laughed, some uncomfortably. He left the grisly picture lingering on the projector while he took questions.

After my talk, I had an opportunity to meet with the executive team. I asked about the total lack of women at the event, and they laughed and said that they had just been talking about that.

LOL, weird, right?

I pushed, and they told me that they were working on it, but had to take it slow. Dead-duck guy? He brought in amazing sales numbers. He apparently saw any effort at diversity as diluting his talented team with charity cases and low performers. They didn’t want to alienate him, so they had to tread carefully.

I cancelled the proof of concept and insisted that they go only through me for future communications I don’t know what other tricks dead-duck-guy had on offer, but I knew I didn’t want him talking to my team.

This particular story has a happy ending. The company did some soul searching and then hired a global head of diversity, who was the most forceful and intersectional person you’ll ever meet. They made a sustained effort to fix their biased and unbalanced team. Dead-duck guy may still be there, but I certainly never saw him around again.

Along the way they discovered something really important: Their product had a much larger potential audience than they had realized.

The company had been blind to that larger market because so many potential customers had been unwilling to take an initial meeting with dead-duck-guy and his team. They never showed up as qualified prospects.

Let me say that again: The macho, hyper-masculine approach of their best sales guy was alienating half of their target audience. The people who didn’t want to deal with him didn’t call back and explain themselves. They just moved on.

Maybe they took Rule 23 to heart.

Conference Season

I mentioned that the inappropriate thing usually shows up later. Let’s talk about that:

Conference season is starting. That means lots of mix-and-mingle events. The goal is relationship building. There will be coffees, breakfasts, lunch-and-learns, bar nights, and boozy steak dinners. There will be private presentations back at the Air B&B, invitations to travel and speak at the national convention, and so on.

As these invitations ramp up, my experience is that they move more and more into masculine spaces that exclude women. Once there, we always tend to see a bit more of the old “locker room” banter. It’s a ratchet that goes in only one direction.

This happens gradually to avoid anybody getting all weird and uptight when the enticements on offer depart from what we talk about in mixed company.

I mean seriously, you know why they put all the big industry events in Las Vegas, right? It’s not for the child care facilities, I can tell you that much.

Why Speak Up?

Real talk: I’m pretty nervous about posting this article.

I know that my contact will see it – I plan to send him a link (seems only fair). I know at least a few other people who will think I’m talking about them. I feel social pressure against rocking the boat and upsetting anybody.

I felt exactly the same way before I spoke up at that all-male team meeting. It’s super stressful go to somebody’s party and tell them that they are doing it wrong. I was the invited speaker. I checked all the boxes of gender, race, and personal presentation to be welcome, and I still very nearly censored myself.

The thing that pushed me over the edge, then and now, is that this is the same pressure that keeps women silent in the face of uncounted insults and indignities. It gave me just the briefest glimpse of what it’s like to be on the unpleasant side of social pressure to conform, stay quiet, and obey. That brief glimpse was enough to motivate me to speak up then, and it continues to do so today.

As the saying goes: If you see something, say something.

I’m saying something.

Not cool, bro.

The network is slow: Part 1

Let me start off by agreeing that yes, the network is slow.

I’ve moved a fair amount of data over the years. Even when it’s only a terabyte or two, the network always seems uncomfortably slow. We never seem to get the performance we sketched out on the whiteboard, so the data transfer takes way longer than expected. The conversation quickly turns to the question of blame, and the blame falls on the network.

No disagreement there. Allow me to repeat: Yes, the network is slow.

This post is the first in a series where I will share a few simple tools and techniques to unpack and quantify that slowness and get things moving. Sometimes, hear me out, it’s not the entire network that’s slow – it’s that damn USB disk you have plugged into your laptop, connected over the guest wi-fi at Panera, and sprayed across half a continent by a helpful corporate VPN.

True story.

My point here is not to show you one crazy old trick that will let you move petabytes at line-rate. Rather, I’m hoping to inspire curiosity. Slow networks are made out of fast and slow pieces. If you can identify and remove the slowest link, that slow connection might spring to life.

This post is about old-school, low-level Unix/Linux admin stuff. There is nothing new or novel here. In fact, I’m sure that it’s been written a bunch of times before. I have tried to strike a balance to make an accessible read for the average command line user, while acknowledging a few of the more subtle complexities for the pros in the audience.

Spoiler alert: I’m not even going to get to the network in this post. This first one is entirely consumed with slinging data around inside my laptop.

Endless zeroes

When you get deep enough into the guts of Linux, everything winds up looking like a file. Wander into directories like /dev or /proc, and you will find files that have some truly weird and wonderful properties. The two special files I’m interested in today both live in the directory /dev. They are named “null” and “zero”.

/dev/null is the garbage disposal of Linux. It silently absorbs whatever is written to it, and never gives anything back. You can’t even read from it!

energon:~ cdwan$ echo "hello world" > /dev/null 
energon:~ cdwan$ more /dev/null
/dev/null is not a regular file (use -f to see it)

/dev/zero is the opposite. It emits an endless stream of binary zeroes. It screams endlessly, but only when you are listening.

If you want your computer to spin its wheels for a bit, you can connect the two files together like this:

energon:~ cdwan$ cat /dev/zero > /dev/null

This does a whole lot of nothing, creating and throwing away zeroes just as fast as one of the processors on my laptop can do it. Below, you can see that my “cat” process is taking up 99.7% of a CPU – which makes it the busiest thing on my system this morning.

Which, for me, raises the question: How fast am I throwing away data?

Writing nothing to nowhere

If my laptop, or any other Linux machine, is going to be involved in a data transfer, then the maximum rate at which I can pass data across the CPU matters a lot. My ‘cat’ process above looks pretty efficient from the outside, with that 99.7% CPU utilization, but I find myself curious to know exactly how fast that useless, repetitive data is flowing down the drain.

For this we need to introduce a very old tool indeed: ‘dd’.

When I was an undergraduate, I worked with a team in university IT responsible for data backups. We used dd, along with a few other low level tools, to write byte-level images of disks to tape. dd is a simple tool – it takes data from an input (specified with “if=”) and sends it to an output (specified with “of=”).

The command below reads data from /dev/zero and sends it to /dev/null, just like my “cat” example above. I’ve set it up to write a little over a million 1kb blocks, which works out to exactly a gigabyte of zeroes. On my laptop, that takes about 2 seconds, for a throughput of something like half a GB/sec.

energon:~ cdwan$ dd if=/dev/zero  of=/dev/null bs=1024 count=1048576
1073741824 bytes transferred in 2.135181 secs (502880950 bytes/sec)

The same command, run on the cloud server hosting this website, finishes in a little under one second.

[ec2-user@ip-172-30-1-114 ~]$ dd if=/dev/zero  of=/dev/null bs=1024 count=1048576
1073741824 bytes (1.1 GB) copied, 0.979381 s, 1.1 GB/s

Some of this difference can be attributed to CPU clock speed. My laptop runs at 1.8GHz, while the cloud server runs at 2.4GHz. There are also differences in the speed of the system memory. There may be interference from other tasks taking up time on each machine. Finally, the system architecture has layers of cache and acceleration tuned for various purposes.

My point here is not to optimize the velocity of wasted CPU cycles, but to inspire a bit of curiosity. While premature optimization is always a risk – I will happily take a couple of factors of two in performance by thinking through the problem ahead of time.

As an aside, you can find out tons of useful stuff about your Linux machine by poking around in the /proc directory. Look, but don’t touch.

[ec2-user@ip-172-30-1-114 ~]$ more /proc/cpuinfo | grep GHz
model name : Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz

Reading and writing files

So now we’ve got a way to measure the highest speed at which a single process on a single CPU might be able to fling data. The next step is to ask questions about actual files. Instead of throwing away all those zeroes, let’s catch them in a file instead:

energon:~ cdwan$ dd if=/dev/zero  of=one_gig  bs=1024 count=1048576
1073741824 bytes transferred in 7.431081 secs (144493358 bytes/sec)

energon:~ cdwan$ ls -lh one_gig
-rw-r--r--  1 cdwan  staff   1.0G Mar  5 08:57 one_gig

Notice that it took almost four times as long to write those zeroes to an actual file instead of hurling them into /dev/null.

The performance when reading the file lands right in the middle of the two measurements:

energon:~ cdwan$ dd if=one_gig of=/dev/null bs=1024 count=1048576
1073741824 bytes transferred in 4.222885 secs (254267367 bytes/sec)

At a gut level, this makes sense. It kinda-sorta ought-to take longer to write something down than to read it back. The caches involved in both reading and writing mean we may see different results if we re-run these commands over and over. Personally, I love interrogating the behavior of a system to see if I can predict and understand the way that performance changes based on my understanding of the architecture.

I know, you were hoping to just move data around at speed over this terribly slow network. Here I am prattling on about caches and CPUs and RAM and so on.

As I said above, my point here is not to provide answers but to provoke questions. Agreed that the network is slow – but perhaps there is some part of the network that is most to blame.

I keep talking about that USB disk. There’s a reason – those things are incredibly slow: Here are the numbers for reading that same 1GB file from a thumb drive:

energon:STORE N GO cdwan$ dd if=one_gig_on_usb of=/dev/null bs=1024 count=1048576
1073741824 bytes transferred in 75.596891 secs (14203518 bytes/sec)

That’s enough for one post. In the next installment, I will show a few examples of one of my all time favorite tools: iperf.

Biology is weird

Biology is weird. The data are weird, not least because models evolve rapidly. Today’s textbook headline is tomorrow’s “in some cases,” and next year’s “we used to think.”

It can be hard for non-biologists, particularly tech/math/algorithm/data science/machine learning/AI folks, to really internalize the level of weirdness and uncertainty encoded in biological data.

It is not, contrary to what you have read, anything like the software you’ve worked with in the past.  More on that later.

This post is a call for humility among my fellow math / computer science / programmer type people.  Relax, roll with it, listen first, come up to speed. Have a coffee with a biologist before yammering about how you’re the first smart person to arrive in their field. You’ll learn something. You’ll also save everybody a bit of time cleaning up your mess.


Don’t be the person who walks into a research group meeting carrying a half read copy of “Genome” by Matt Ridley, spouting off about how all you need is to get TensorFlow running on some cloud instances under Lambda and you’re gonna cure cancer.

This is not to speak ill of “Genome,” it’s a great book, and I’m super glad that lots of people have read it – but it no more qualifies you to do the heavy lifting of genomic biology than Lisa Randall’s popular press books prepare you for the mathematical work of quantum physics.

You’ll get more cred with a humble attitude and a well thumbed copy of “Life Ascending” by Nick Lane. For full points, keep Horace Judson’s “The Eighth Day of Creation” on the shelf.  Mine rests between Brooks’ “The Mythical Man Month” and “Personality” by Daniel Nettle.

The More Things Change

Back in 2001, the human genome project was wrapping up.  One of the big questions of the day was how many genes we would find in the completed genome.  First, set aside the important but fundamentally un-answerable question of what, exactly, constitutes a gene.  Taking a simplistic and uncontroversial definition, I recall a plurality of well informed people who put the expected total between 100,000 and 200,000.

The answer?  Maybe a third to a sixth of that.  The private sector effort, published in Science, reported an optimistically specific 26,588 genes.  The public effort, published in Nature, reported a satisfyingly broad 30,000 to 40,000. 

There was a collective “huh,” followed by the sound of hundreds of computational biologists making strong coffee. 

This happens all the time in Biology. We finally get enough data to know that we’ve been holding the old data upside down and backwards.

The fundamental dogma of information flow from DNA to RNA to Protein seems brittle and stodgy when confronted with retroviruses, and honestly a bit quaint in the days of CRISPR.  I’ve lost count of the number of lower-case modifiers we have to put on the supposedly inert “messenger molecule” RNA to indicate its various regulatory or even directly bio-active roles in the cell.

Biologists with a few years under their belt are used to taking every observation and dataset with a grain of salt, to constantly going back to basics, and to sighing and making still more coffee when some respected colleague points out that that thing … well … it’s different than we expected.

So no, you’re not going to “cure cancer” by being the first smart person to try applying math to Biology.  But you -do- have an opportunity to join a very long line of well meaning smart people who wasted a bunch of time finding subtle patterns in our misunderstandings rather than doing the work of biology, which is to interrogate the underlying systems themselves.


To this day, whenever I look at gene expression pathways I think: “If I saw this crap in a code review, I would send the whole team home for fear of losing my temper.”

My first exposure to bioinformatics was via a seminar series at the University of Michigan in the late 90’s. Up to that point, I had studied mostly computer science and artificial intelligence. I was used to working with human-designed systems. While these systems sometimes exhibited unexpected and downright odd behaviors, it was safe to assume that a plan had, at some point, existed. Some human or group of humans had put the pieces of the system together in a way that made sense to them.

To my eye, gene expression pathways look contrived. It’s all a bit Rube Goldberg down there, with complex and interlocking networks of promotion and inhibition between things with simple names derived from the names of famous professors (and their pets). 

My design sensibilities keep wanting to point out that there is no way that this mess is how we work, that this thing needs a solid refactor, and that … dammit … where’s the coffee?

It gets worse when you move from example to example and keep finding that these systems overlap and repeat in the most maddening way. It’s like the very worst sort of spaghetti code, where some crazy global variable serves as the index for a whole bunch of loops in semi-independent pieces of the system, all running in parallel, with an imperfect copy paste as the fundamental unit of editing.

This is what happens when we apply engineering principles to understanding a system that was never engineered in the first place.

Those of us who trained up on human designed systems apply those same subconscious biases that show us a face in the shadows of the moon. We’re frustrated when the underlying model is not based on noses and eyes but rather craters and ridges. We go deep on the latest algorithm or compute system – thinking that surely there’s reason and order and logic if only we dig deep enough.

Biologists roll with it. 

They also laugh, stay humble, and drink lots of coffee.

Fixing the Electronic Medical Mess

In my previous blog post, I talked about the fact that medical records are a dumpster fire from a scientific data perspective. Apparently this resonated for people.

This post begins to sketch some ideas for how we might start to correct the problem at its root.

Lots of people have thought deeply about this stuff. One specific example is the Apperta Foundation whose white paper makes a wonderful introduction to the topic.

@bentoth’s second point, above, is exactly correct. Until we put the patient at the center of the medical records system, we’re going to be digging in the trash.

The question is how we get from here to there.

Not Starting From Zero

Before digging in, I want to address a very valid objection to my complaint:

It’s true: Even given the current abysmal state of things, researchers are still making important discoveries. This indicates to me that it will be well worth our while to put some time and effort into improving things at the source. If we can get value out of what we’ve got now, imagine the benefits to cleaning it up!

Who Speaks for the Data?

One of the first steps towards better data, in any organization, is to identify the human beings whose job is, like the Lorax, “speak for the data.” Identifying, hiring, and radically empowering these folks is a recommendation that I make to many of my clients.

Just … please … don’t call them a “data janitor.”

If you tell me that you have “data janitors,” I know that you consider your data to be trash. Beyond that, I also know that you consider curation, normalization, and data integration to be low-respect work that happens after hours and is not part of the core business mission. It’s not a big jump from there to realize that the structures and incentives feeding the problem aren’t going to change. Instead, you’re just going to hire people to pick through the trash and stack up whatever recyclables they can find.

I’ve even heard people talk about hiring a “data monkey.” Really, seriously, just don’t do that, even in casual conversation. It’s not cool.

Who does the work?

It takes a huge amount of work to capture primary observations, and still more effort to connect them to the critical context in which they were created. Good metadata is what allows future users to slice, dice, and confidently use datasets that they did not personally create.

Then there is the sustained work of keeping data sets coherent. Institutional knowledge and assumptions change and practices drift over time. Even though files themselves may not be corrupted, data always seems to rot unless someone is tending it.

This work cannot simply be layered on as yet another task for the care team. Physicians and nurses are already overwhelmed and overworked. Adding another layer of responsibility and paperwork to their already insane schedules will not work.

We need to find a resource that already exists, that scales naturally with the problem, and who also has a strong self-interest in getting things right.

Fortunately, such a resource exists: We need to leverage patients and their families. We need to empower them to curate and annotate their own medical records, and we need to do it in a scalable and transparent way.

I’m willing to bet that if we start there, we’ll wind up with a population who are more than happy, for the most part, to share slices of their data because of the prospective benefits to people other than themselves.

The tools already exist

Health systems don’t encourage it, but patients can and do demand access to the data derived from their own bodies. People suffering from rare or undiagnosed diseases make heavy use of this fact. They self-organize, using services like Patients Like Me or Seqster to carry their own information back and forth between the data silos built by their various providers and caregivers. Similarly, physicians can work with services like the Matchmaker Exchange to find clues in the work of colleagues around the world.

Unfortunately, there is no easy way for this cleaned and organized version of the data to get back into the EMR from which it came. That’s the link to be closed here – people are already enthusiastically doing the work of cleaning this data. They are doing it on their own time and at their own expense because the self-interest is so clear.

The job of the Data Lorax is to find a way to close that loop and bring cleaned data back into the EMR. This is different from what we do today, so we’re going to need to adapt a lot of systems and processes, and even a law or a rule here or there.

Fortunately, it’s in everybody’s interest to make the change.

The Electronic Medical Mess

I posted a quick tweet this morning about the state of data in health care.

Over the years, I’ve worked with at least half a dozen projects where earnest, intelligent, diligent folks have tried to unlock the potential stored in mid to large scale batches of electronic medical records. In every case, without exception, we have wound up tearing our hair and rending our garments over the abysmal state of the data and the challenges in getting access to it at all. It is discordant, incomplete, and frequently just plain-old incorrect.

I claim that this is the result of structural incentives in the business of medicine.

What is a Medical Record?

Years ago the medical record was how physicians communicated amongst themselves. The “clinical narrative” was a series of notes written by a primary care physician, punctuated by requests for information and answers from specialists. Physicians operated with an assumption of privacy in these notes, since patients didn’t generally ask to see them. Of course they were still careful with what they wrote. If things went sideways, those notes might wind up being read aloud in front of a judge and jury.

In the 80’s, electronic medical records (EMRs) added a new dimension to this conversation. EMRs were built, in large part, to support accurate and timely information exchange between health care organizations and “payers” including both corporate and government insurance. EMRs digitized the old clinical narrative basically unchanged. They sometimes allowed in-house lab values to be transferred as data rather than text, though in many cases that sort of feature came much later. Most of the engineering effort went into building a framework for billing and payment.

The savvy reader will note that neither of these is a particularly good way to build a system for the collection of patient data.  Instead, we’re dealing with risk avoidance.

A Question of Risk and Cost

Being the Chief Information Officer (CIO) of a health care system or a hospital is a hard, stressful, and frequently thankless job. Information Technology (IT) is usually seen as a cost center and an expense rather than as a driver of revenue. A savvy CIO is always looking for ways to reduce costs and allow their institution to put more dollars directly into the health care mission. Successful hospital CIOs spend a lot of time thinking about risk. There are operational risks from attacks like ransomware, compliance risks, risks that the hospital will expose patient data inappropriately, financial risks from lost revenue, legal risks from failing to meet standards of care, and many more.

These pressures lead to a very sensible and consistent mindset among hospital CIOs: They have a healthy skepticism of the “new shiny,” an aversion to change, and a visceral awareness of their responsibility to consistent and compliant operations

So physicians are incentivized to avoid litigation, hospital information systems are incentivized to reduce exposure, and the core software we use for the whole mess is written primarily to support financial transactions.

Every single person I’ve ever met in the business and practice of health care, without exception, wants to improve patient lives. This is not a case where we need to find the bad, the malevolent, or the incompetent people and replace them. Instead, it’s one of those situations where good, smart hardworking people are stuck with a system that we all know needs a solid shake-up.

That means that when someone (like me) shows up and proposes that we change a bunch of hospital practices (including modifying that damn EMR software) so that we can gather better data, it falls a bit flat. If I reveal my grand plan to take the data and use it for some off-label purpose like improving the standard of care globally, I am usually politely but firmly shown the door.

But it gets worse.

Homemade Is Best

Back in the bad old days, it was possible to convince ourselves that observations made by physicians were the best and only data that should be used in the diagnosis of disease. That’s demonstrably untrue in the age of internet connected scales and wearable pulse and sleep monitors. I’ve written before about the reaction I receive when I show up to my doctor as a Patient With A Printout (PWP). Even here in 2019, there are not many primary care physicians who are willing to look at data from a direct to consumer genetics or wellness company.

The above isn’t strictly true. I know lots of physicians who have a very modern approach to data when we talk over coffee or dinner. However, at work, they have to do a job. The way they are allowed to do that job is defined by CIOs and hospital Risk Officers who grow nervous when we try to introduce outside data sources in the clinical context. What assertions do we have that these wearable devices meet any standards of clinical care? Who, they might ask, will be be legally responsible if a diagnosis is missed or an incorrect treatment applied?

So we’re left with a population health mindset that says “never order a test unless you know what you’re going to do with the result,” except that in this case it’s “don’t look at a test that was already done, you might wind up with an inconvenient incidental finding, and then we’ll have to talk to legal.”

Health systems incentivize risk avoidance above more accurate or timely data. They do this because they are smart, and because they want to stay in business.

So we collect information with a system tuned for billing, run by people whose focus is on risk avoidance. Is it any wonder that when we extract that data, what we find is a conflicting and occasionally self-contradictory mess?

There’s no incentive to have it any other way,

A Better Way

Here in 2019, most people who pay attention to such things believe that data driven health insights will lead to better clinical outcomes, better quality of life, lower overall costs for health care, and many other benefits.


One ray of hope comes from the online communities that spring up to connect people with rare and terrible diseases. These folks share information amongst friends, family, researchers, and physicians as they search desperately for any hope of a cure. Along the way, they create and curate incredibly valuable data resources. The difference between these patient centric repositories and any extraction that we might get from an EMR is simply night and day.

A former colleague was fond of saying, “a diagnosis of cancer really clarifies your thinking about the relative importance of data privacy.”

Put another way: If we put the patient at the center of the data equation, rather than payment, we’re really not that very far from a much better world – and all those wonderful technologies I mentioned will suddenly be quite useful.

Unfortunately, that’s a political question these days:

So where do we go from here? I’m not sure.

I do know for certain that -merely- flinging the messy pile of junk against the latest Machine Learning / Artificial Intelligence / Natural Language Processing software, without addressing the underlying data quality, is unlikely to yield durable and useful results.

Garbage in, garbage out – as the saying goes.

I would love to hear your thoughts.

Letting the genome out of the bottle

About eleven years ago, in January of 2008, the New England Journal of Medicine published a perspective piece on direct to consumer genetic tests, “Letting the Genome out of the Bottle, Will We Get Our Wish.” The article begins by describing an “overweight” patient who “does not exercise.” This man’s children have given him the gift of a direct to consumer genetics service at the bargain price of $1,000.

The obese person who (did we mention) can’t be troubled to go to the gym is interested in medical advice based on the fact that they have SNPs associated with both diabetes and cardiovascular disease. The message is implied in the first paragraph, and explicitly stated in the last:  “Until the genome can be put to useful work, the children of the man described above would have been better off spending their money on a gym membership or a personal trainer so that their father could follow a diet and exercise regimen that we know will decrease his risk of heart disease and diabetes.”

Get it?  Don’t bother us with data.  We knew the answer as soon as your heavy footfalls sounded in the hallway.  Hit the gym.

The authors give specific advice to their colleagues “for the patient who appears with a genome map and printouts of risk estimates in hand.”  They suggest dismissing them:  “A general statement about the poor sensitivity and positive predictive value of such results is appropriate … For the patient asking whether these services provide information that is useful for disease avoidance, the prudent answer is ‘Not now — ask again in a few years.'”

Nowhere do the authors mention any potential benefit to taking a glance at the sheaf of papers this man is clutching in his hands.

Just 10 years ago, a respected and influential medical journal told primary care physicians to discourage patients from seeking out information about their genetic predisposition to disease.  Should someone have the nerve to bring a “printout,” they advise their peers to employ fear, uncertainty, and doubt. They suggest using some low-level statistical jargon to baffle and deflect, before giving answers based on a population-normal assumption.

The reason I’m writing this post is because I went to the doctor last week and got that exact answer, almost verbatim.  I already went off about this on twitter.  I’m writing this because I think that it may benefit from a more nuanced take.

More on that at the end of the post.

Eight bits of history

For all its flaws, the article does serve as a fun and accessible reminder of how far we have come a decade.

I did 23andme when it first came out. I’ve downloaded my data from them a bunch of times.  Here are the files that I’ve downloaded over the years, along with the number of lines in each file:

cdwan$ wc -l genome_Christopher_Dwan_*
576119 genome_Christopher_Dwan_20080407151835.txt
596546 genome_Christopher_Dwan_20090120071842.txt
596550 genome_Christopher_Dwan_20090420074536.txt
1003788 genome_Christopher_Dwan_Full_20110316184634.txt
1001272 genome_Christopher_Dwan_Full_20120305201917.txt

The 2008 file contains about 576,000 data points.  That doubled to a bit over a million when they updated their SNP chip technology in 2011.

The authors were concerned that “even very small error rates per SNP, magnified across the genome, can result in hundreds of misclassified variants for any individual patient.”  When I noticed that my results from the 2009 download were different from those in 2008, I wrote a horrible PERL script to figure out the extent of the changes. I still had sitting around on my laptop, so ran it again today. I was somewhat shocked that it worked on the first try, a decade and at least two laptops later!  

My 23andme results were pretty consistent. Of the SNPs that were reported  in both v1 and v2, my measurements differ at a total of 54 loci. That’s an error rate of about one hundredth of one percent. Not bad at all, though certainly not zero.

For comparison, consider the height and weight that usually gets taken when you visit  a doctor’s office. In my case, they do these measurements with shoes and clothing on – meaning that I’m an inch taller (winter boots) and about 8 pounds heavier (sweater and coat) if I see my doctor in the winter. Those are variations of between 1% and 5%.

Fortunately, nobody ever looks at adult height or weight as measured at the doctor’s office. They put us on the scale so that the practice can charge our insurance providers for a physical exam, and then the doctor eyeballs us for weight and any concealed printouts.

A data deluge

Back to genomics: $1,000 will buy a truly remarkable amount of data in late 2018.  I just ordered a service from Dante Labs that offers 30x “read depth” on my entire genome.  They commit to measure each of my 3 billion letters of DNA at least 30 times.  Taken together, that’s 90 billion data points, or 180,000 times more measurements than that SNP chip from a decade ago.  Of course, there’s a strong case to be made that those 30 reads of the same location are experimental replicates, so it’s really only 3 billion data points or 6,000 times more data. Depending on how you choose to count, that’s either 12 or 17 doublings over a ten year span.   

Either way, we’re in a world where data production doubles faster than once per year.

This is a rough and ready illustration of the source of the fuss about genomic data.  Computing technology, both CPU and storage, seems to double in capacity per dollar every 18 months. Any industry that exceeds that tempo for a decade or so is going to experience growing pains.

To make the math simple, I omitted the fact that this year’s offering -also- gives me an additional 100x of read depth within the protein coding “exome” regions, as well as some even deeper reading of my mitochondrial DNA.

One real world impact of this is that I’m not going to carry around those raw reads on my laptop anymore. The raw files will take up a little more than 100 gigabytes, which would be. 20% of my laptop hard disk (or around 150 CD ROMs). 

I plan to use the cloud, and perhaps something more elegant than a 10 year old single threaded PERL script, to chew on my new data.

The more things change

Back to the point:  I’m writing this post because, here in late 2018, I got the -exact- treatment that the 2008 article recommends. It’s worse than that, because I didn’t even bring in anything as fuzzy as genotypes or risk variants.  Instead, I brought lab results, ordered through Arivale, and generated by a Labcorp facility to HIPAA standards.

I’ve written about Arivale before.  They do a lab workup every six months. That, coupled with data from my wearable and other connected devices provides the basis for ongoing coaching and advice.

My first blood draw from Arivale showed high levels of mercury. I adjusted my diet to eat a bit lower on the food chain. When we measured again six months later, my mercury levels had dropped by 50%. However, other measurements related to inflammation had doubled over the same time period.  Everything was still in the “normal” range – but a fluctuation of a factor of two struck me as worth investigating.

I use one of those fancy medical services where, for an -additional- out-of-pocket annual cost, I can use a web or mobile app to schedule appointments, renew prescriptions, and even exchange secure messages with my care team. Therefore, I didn’t have to do anything as undignified as bringing a sheaf of printouts to his upscale office on a high floor of a downtown Boston office building.  Instead, I downloaded a PDF from Arivale and sent them as a message with my appointment request.

When we met, my physician had printed out the PDFs.  Perhaps this is part of that “digital transformation” I’ve heard so much about. The 2008 article is studiously silent on the topic of doctors bearing printouts. I’m guessing it’s okay if they do it.

Anyway, I had the same question as the obese, exercise-averse patient who drew such scorn in the 2008 article:  Is there any medical direction to be had from this data?

My physician’s answer was to tell me that these direct to consumer services are “really dangerous.”  He gave me the standard line about how all medical procedures, even minimally invasive ones, have associated risks. We should always justify gathering data in terms of those risks, at a population level. He cautioned me that going down the road of even looking at elevated inflammation markers can lead to uncomfortable, unnecessary, and ultimately dangerous procedures.

Thankfully, he didn’t call me fat or tell me to go get a gym membership.

This, in a nutshell is our reactive system of imprecision medicine.

This is also an example of our incredibly risk averse business of medicine, where sensible companies will segment and even destroy data to avoid the danger of accidentally discovering facts that they might be obligated to report or to act on.

This, besides the obvious profit motive, is why consumer electronics and retail outfits like Apple and Amazon are “muscling into healthcare.”

The void does desperately need to be filled, but I think it’s pretty terrible that the companies best poised to exploit the gap are the ones most ruthlessly focused on the bottom line, most extractive in their runaway capitalism, and who have histories of terrible practices around both labor and of privacy.

A happy ending, perhaps

I really do believe that there is an opportunity here: A chance to radically reshape the practice of medicine. I’m a genomics fanboy and a true believer in the power of data.

To be clear, the cure is not any magical app. The transformation will not be driven simply by encoding our data as XML, JSON, or some other format entirely. No specific variant of machine learning or artificial intelligence is going to un-stick this situation.

It’s not even blockchain.

The answer lies in a balanced approach, with physicians being willing to use data driven technologies to amplify their own senses, to focus their attention, to rapidly update their recommendations and practices, and to measure and adjust follow ups and follow throughs.

To bring it back to our obese patient above, consider the recent work on polygenic risk scores, particularly as they relate to cardiovascular health. A savvy and up-to-date physician would be well advised to look at the genetics of their patients – particularly those of us who don’t present as a perfect caricature of traditional risk-factors for heart disease.

I’ve written in the past about another physician who sized me up by eyeball and tried to reject my request for colorectal cancer screening, despite a family history, genetic predisposition, and other indications.  “You look good,” he said, “are you a runner?”

There is a saying that I keep hearing:  “Artificial Intelligence will not replace physicians.  However, physicians who use Artificial Intelligence will replace the ones who do not.”

The same is true for using all the data available. In my opinion, it is well past time to make that change.

I would love to hear what you folks think.