Whose Genes Are They, Anyway?
You have seen the ads from companies that promise to tell you, based on your DNA, where your ancestors came from. You are eager to trace your family’s roots, so you order a test kit, send in your sample and await the results.
Your involvement with the company may end there, but two Penn State researchers say that for your DNA sequence — your genome — the journey has just begun.
What you may not realize is that when you get your DNA sequenced, in most cases you don’t own the sequence in a legal sense. The company that sequenced it does, or at least, in our current legal framework, it can act as if it does: It can sell or give your data to other organizations, which often are not bound by the agreement you signed with the sequencing company. Even if you pay for just the basic service that will allow you to sketch your ethnic background, the company may sequence your entire genome — and then pass that information along to others.
An open-ended journey
Your genome’s first stop will probably be at a research institution, where it joins a database of thousands or millions of other genomes that researchers use to pinpoint genes that correlate with specific diseases or health risks. Those institutions, in turn, may partner with businesses that use the data to develop commercial products and services they can sell, such as pharmacogenomics drugs.
“Many, many people do not understand what the potential uses of their DNA might be,” says Barbara Gray, emerita professor of business in Penn State’s Smeal College of Business.
Promise and peril
The first complete human genome sequence was published in 2003, after a 13-year international effort that involved hundreds of researchers and cost $2.7 billion. Since then, sequencing technology has gotten faster and much less costly. At the same time, the advent of supercomputing centers that can analyze and compare millions of genomes has turned the mountain of raw genomic data into a motherlode of invaluable information. In 2017, investment in genomics businesses topped $3 billion.
The promise that justifies this level of investment and excitement is health: the potential to create medical care tailor-made for each individual. That raises a privacy issue, though, because scientists looking for variants related to health and disease need access to entire genomes, not just the short segments used for general ancestry work.
Other people examining our genes would seem intrusive to many of us, but for others, like the parents of children with rare and incurable diseases, privacy is a lesser concern. Many such families freely share their genetic data and medical records in hopes that researchers will be able to identify gene variants responsible for the disease, and perhaps develop better therapies or even a cure.
The problem is that when dealing with your genetics, it is never just about you, says Gray. “The DNA for your blood relatives is very similar to yours, so when you put your data in the system, you’re not only exposing yourself, you’re also exposing your progeny, your parents, uncles and aunts, and other people in your family, who did not sign a waiver.”
Privacy and profit
To allay fears, some companies separate genome data from the name, age, gender and other personal details about the person who provided the genome. The idea is that if we just send in a saliva sample, and our name and other identifying information are kept separate from the genome data, we will be anonymous to the system and its users.
Unfortunately, says Forrest Briscoe, professor of management and organizaation at Smeal, as we learn more about the genetics of personal traits, it becomes more difficult to keep our genomes anonymous. Scientists recently announced that they can predict what you look like, with fair accuracy, just from your DNA sequence, and Briscoe says there is now a thriving cottage industry in creating algorithms that can identify a specific person within a supposedly anonymized collection of genomes.
Then there is the possibility of being identified in ancestry databases even if you have never had your own DNA sequenced. In early 2018, police identified the serial rapist-murderer known as the Golden State Killer by comparing DNA from crime scenes with genome information and family trees in a publicly-available ancestry website. The genome data came from relatives who had had their DNA sequenced for genealogy purposes. Several other cases have been solved in a similar way but to less fanfare.
“It’s the same kind of data that you can use for biomedical research, here used for tracking someone down,” says Briscoe. “That’s kind of violating their privacy. It’s good when we’re finding killers, but it might not be good for other reasons.”
Who’s minding the store?
The physical location of DNA databases is also a concern. A single human genome contains about 7 GB of data, which means that a collection of thousands or millions of genomes runs to…almost unimaginable numbers. To store and analyze that much data requires heavy-duty computer firepower, usually referred to as “super-computing,” and massive storage space — which is almost always in “the cloud.”
But the computing cloud is housed in huge arrays of digital equipment, in specific buildings in specific countries, managed by employees of specific organizations. Data can be stored in any geographic location, but it is better to keep close to the corporate home, or at least in the same country, where the rules are understood and the legal recourse if they are not obeyed is clear.
“Right now the fastest-growing big databases are not in the U.S.,” says Briscoe.
The newness of the field is also a concern. There are a lot of start-up businesses dealing with DNA; when some of those inevitably fail, what happens to the genomic data they hold?
The health-care model
Because researchers have a better chance of finding meaningful links if they have more genomes at their disposal, organizations from small startup companies to the National Institutes of Health are scrambling to develop bigger databases with more genomic information from more people. Their legal agreements with donors, and the security measures they employ to protect the data, are all over the map. A consortium of organizations is working to design protocols for storage and sharing of data, but to date, there is no industry-wide standard.
In medical settings, genomic data can get out the same ways that private health information systems can be breached now — sloppy office procedures, doctors sharing patient info, outright theft by someone with access. Oddly, genetic data, even that collected in a medical or scientific context, is not covered under HIPAA, the federal regulation that says health-care providers can’t reveal your medical information to others without your consent. Your doctor needs your permission to tell someone else your blood pressure, but medical researchers can send your entire genome to others without telling you about it.
There is definitely an upside to making your genome available to researchers, say Briscoe and Gray: If your DNA isn’t included, it can’t be part of what the researchers discover. But there is a dark side, as well. What happens if insurance companies or employers gain access to your DNA data? The federal Genetic Information Non-discrimination Act, passed in 2008, bars employers from considering the genetic information of employees, job applicants or members of their families, but a bill now before Congress, H.R. 1313 (the “Preserving Employee Wellness Programs Act”) would get around that by allowing employers to “invite” employees to provide a DNA sample, and to charge those who say no up to twice as much for health insurance.
Searching for norms
Gray and Briscoe think the various stakeholders in genomics come from such different backgrounds that it could be difficult for them to agree on norms for the field. Governance that starts within the health-care system tends to reflect the patient-oriented values of that system, but the regulatory and values systems in the business world may lead to different guidelines.
Two general frameworks stake out opposing positions. On one end of the spectrum are those who advocate for the rights of individuals to control who sees their genetic information and under what conditions. Others favor an open-source process where everyone’s genetic data is available for anyone to see. This could lead to faster advances in the field, says Briscoe, but in addition to conflicting with the notion of genomic privacy, this option isn’t likely to be popular with companies who have invested a lot in assembling their own databases. The solution, if the industry settles on just one framework, is likely to lie between these extremes.
Their study is still young, but Gray and Briscoe have already learned enough to reach one conclusion: Barring a medical emergency, they don’t plan to send in their own cheek swabs for sequencing.