Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to be my own genetic disease researcher for my partner?
262 points by thetwentyone on Dec 7, 2021 | hide | past | favorite | 83 comments
My partner was diagnosed with a rare condition/disease with unclear cause/pathology.

I feel like it would be impossible for a doctor to stay abreast of all of the possible links/data unless they focused very narrowly on a patient.

I'd like to try and fill that gap - look at the data and relay any potential links/causes to the providers.

We have the full genome in CRAM, CRAI, FASTQ, VCF, and TBI data - is there a way that me, a medical layman but well informed person could leverage this data to mine for possible matching genetic variants?

e.g. I have started finding genes associated with my partner's condition in the NCBI website and the ClinVar Miner (https://clinvarminer.genetics.utah.edu/variants-by-condition)

Is it sufficient to identify variants by searching for the SNP string (e.g. "rsXXXXXX") in the VCF file?

Are there "hacker's guide to genomic analysis" resources out there?




I've been in your shoes, but for my son, who ended up being the first case ever discovered of his particular genetic disorder.

I'm happy to help.

I written down an Algorithm for Precision Medicine that abstracted the journey all the way from diagnosis to treatment:

https://bertrand.might.net/articles/algorithm-for-precision-...

My day job is now to help patients like your partner all day every day at the Precision Medicine Institute at UAB.

Feel free to reach out to us, and we'll be happy to craft research strategy and provide technical tips.


If you truly think this is a rare disease, it's very likely that there's just one single causal mutation in your partner's genome. The easiest way to find that is to:

1. Search for variants in that genome where the allele frequency is close to 0 in a very large population e.g. https://gnomad.broadinstitute.org/

2. Look into variant effects for those you prioritized in step 1 using https://www.ensembl.org/info/docs/tools/vep/index.html

Rare diseases are typically due to a coding mutation that alters the protein coding sequence in some significant way.

If you need help contact details are on my profile. I do this for a living at a university.

rsIDs are a minefield as they change often, there are synonyms and probably you won't have all loci properly annotated. Don't rely on that too much unless you really know what you are doing.

If it's not a rare disease, this gets quite more difficult. Also, depending on the whole genome sequencing platform you have used, many structural variants (e.g. deletions or insertions of large chunks of DNA) won't be easy to measure.

Other comments have suggested Promethease, which will give you a bit of help if it's not a rare disease (e.g. if it's an autoimmune one, it's good at imputing HLA and finding risk haplotypes).

My whole comment is a bit of an oversimplification, but I think these suggestions are a good starting point.


This advice is spot on!

I haven't worked on this problem, but others in my graduate lab did. If you're interested in a tool that automates some of this process (takes VCF as input; filters variants based on frequency; you'll need to map disease symptoms/phenotypes to Human Phenotype Ontology [1] identifiers), some of my former lab mates developed a web tool [2]: https://amelie.stanford.edu/submit

[1] https://hpo.jax.org/app/

[2] https://www.medrxiv.org/content/10.1101/2020.12.29.20248974v...


Apologies for hijacking the top comment, I do not know where else to ask this and I found no local researchers:

My son was born 3 months ago with Poland syndrome. This came as a shock but it has also drawn me to look into the scientific literature.

While the common belief was that PS has no underlying genetic cause, there are papers suggesting that the may be.

Studying - I ran across many anomalies on my own body (his father), so minor that there were never considered relevant until now (I'm 40 and lead a normal life).

It would seem that on the right side of my body I have at least:

  - A mild case of Becker's Nevus
  - Single palmar crease 
  - A somewhat smaller shoulder blade (suggesting a Sprengel diformity?)
If the above is correct, this may be an opportunity (by studying my genome and my son's genome) to establish a link or a common cause for Becker Nevus Syndrome and Poland Syndrome - both fairly rare anomalies.

Can you suggest who may be interested in studying this?

This has no value for me or my son, however the scientific endeavour may be of value for the future.


I just wanted to wish you all the best. I really empathize. It felt unfair to discover the statistics on birth defects vs age of the parents -- there's almost a linear correlation between age and defects, and no one ever told me. I'll be 34 in Feb, and I've wondered many times whether I'd be a good dad if our kiddo pops out with a few missing pieces. (We're finally in a position where IVF is on the horizon, so it's a constant worry.)

It's incredibly inspiring to have an example like yours. Thank you so much for trying to connect with researchers to help them understand the disease, even though it "has no value for you or your son." Your attempt has a lot of value as a model to follow. I'll try to do the same thing if we end up in a similar position. Good luck!


My wife and i had kids at your age and 15 years on we're all doing very well. We know older parents, also doing well. We were told there were some risks increased for an older mother, but with hindsight these were small percentage chance of something to a slightly larger but still very small percentage. True, no one wants to be the rare bad case. If you're planning pregnancy, i think i recall it's good if the mother has Folic Acid, but I'm no scientist. https://www.cdc.gov/ncbddd/folicacid/about.html Good luck, I hope it works out.


Do you have any links about that?

I've never heard of this correlation and have friends that became parents at 50 so I am curious.

I know that the older you get, the riskier it is to have children, but had no idea it was a linear correlation.


It's ridiculously hard to discover. I tried to signal boost it at https://twitter.com/theshawwn/status/1441657590501445651 but depressing facts tend not to get much traction.

From https://news.ycombinator.com/item?id=28650922:

> A woman’s peak reproductive years are between the late teens and late 20s. By age 30, fertility (the ability to get pregnant) starts to decline. This decline becomes more rapid once you reach your mid-30s. By 45, fertility has declined so much that getting pregnant naturally is unlikely for most women.

> Down syndrome (trisomy 21) is the most common chromosome problem that occurs with later childbearing. The risk of having a pregnancy affected by Down syndrome is

> 1 in 1,480 at age 20

> 1 in 940 at age 30

> 1 in 353 at age 35

> 1 in 85 at age 40

> 1 in 35 at age 45 [2]

My jaw dropped.

I probably shouldn't claim "birth defects" in the general sense, just Down syndrome specifically. But the wording of "most common chromosome problem" seems to imply that this is a pretty reasonable inference.

Had no idea I was risking my kiddo's health so much by waiting.


Holy cow. That's pretty scary!

1 in 35 is a lot.

Thanks for that!


There is also Matchmaker Exchange who's aim is to be a clearing house where very rare diseases become slightly less rare when patients can locate someone else with the same thing. So an answer to who may be interested, could be someone else who has your symptoms.

https://www.matchmakerexchange.org/


Poland Syndrome may have links to other forms of syndactyly, so you maybe able to narrow down some of the patterns, which could include cultural diets or local environmental factors. https://rarediseases.info.nih.gov/diseases/13181/syndactyly /Edit** This also explains a bit about gene regulation ie switching on & off and ramping them up or down https://www.ncbi.nlm.nih.gov/books/NBK26872/ /Edit** Clues would be looking at conditions with the same properties like webbing and if its exactly the same condition, treatments for these other conditions may become relevant.

There is a lot of studies on Google Scholar and it goes back to the 1800's. Some studies in livestock or vegetation (crops mainly) can also illicit clues because despite being different some chemical reactions or some end results will be the same in humans and animals and plants. You just cant change some of the chemical reactions, melatonin be one that is seen in humans, animals and plants, it increases in darkness.


Just about any researcher who published on it. It’s not unusual to see an article describing single clinical case.


Can confirm this, this was essentially what my first scientific article was about. If you choose to avoid rsID and use genomic location, remember that there are different versions of the human genome, and make sure to align it with gnomAD's version.


This is awesome! I also do this for a living, tending more towards oncogenic side, but what you said is perfectly fine!

I'd also love to help OP :)


I work in an adjacent area and agree this is all good advice.

OP, how did you even get the sequence to begin with? I have a friend who has an immunodeficiency which is almost certainly due to a rare genetic disorder and want to do a very similar thing. Despite contacting his physician, fellow researchers, and even my institution's president -- with friend's full cooperation -- no one is willing to pay for it.

I'm at my wit's end to the point that I'm starting to think the only viable option is paying for it out of pocket, but it's not cheap.

A question you might want to ponder is: suppose you isolate the problem to a single missense/nonsense/truncation mutation in a protein that seems likely to cause the phenotype. How do you plan to use that information? In theory, there is gene therapy, but in reality, given how much effort I have had to go through just to get this fellow sequenced -- and I'm a PhD working in genomics with a lot of contacts -- creating a custom one-off gene therapy solution seems like it would be a very tremendous undertaking.

There is a very difficult problem here in that rare or "personalized" disease treatments are: A) not profitable, so drug companies have no interest, B) there are mountains of paperwork, IRBs, consent waivers, etc, involved in developing an experimental therapeutic, C) by definition you cannot do a proper clinical trial on a one-off, and D) it requires several different types of expertise to pull such a thing off. Sadly this means that it almost never happens, even though I suspect there are a lot of severe and lifelong genetic disorders which could be diagnosed and treated with technology available today.

Based on my experience so far, I suspect that even if you were to hand his physician very strong evidence that "the problem is caused by this specific single mutation", the response will be "OK, thanks". You should not make strong assumptions about them being able to take it from there. All this is based on the best-case scenario of it being a single variant in a coding region; if the disorder is caused by multiple variants at different loci, anything you find will probably not be actionable.


> OP, how did you even get the sequence to begin with? [...] no one is willing to pay for it.

My neurologist ordered sequencing for me from Invitae, to determine the subtype of Ehlers-Danlos I have and rule out neuromuscular diseases. She said insurance usually covers it, and it's only a few hundred bucks if they don't. Invitae appears to do WGS for such panels. I've also heard of Nebula genomics offering affordable WGS and exome sequencing.

She said she'd take a look at the results, and if anything popped out as unusual, I'd see a geneticist.

> A question you might want to ponder is: suppose you isolate the problem to a single missense/nonsense/truncation mutation in a protein that seems likely to cause the phenotype. How do you plan to use that information?

Identify the molecular pathway involved and see if there's any drugs available that might modulate it in a therapeutic way. You might also identify similar diseases that might share similar treatments, once you know the etiology.

Once a mutation or gene responsible is identified, other patients can be as well, which can slowly lead to mouse models and clinical trials etc.


This. It's not my field but I strongly suspect I have a troublesome mutation of some sort. If I was able to track it down what good would that do? (I almost certainly inherited it which is why I think it's genetic.) Sequencing isn't that expensive these days, I keep thinking about it but even if I found enough others to pin down the mutation what good would that do? "Success" would simply be an accurate diagnosis, nothing more.


Not OP, but I got my whole genome sequenced from sequencing.com. It is currently around 399 USD. The sequencing is done by Nebula Genomics, nebula.org, where it is currently 299 USD.

You get to keep all the raw data from the sequencing and you will get some reports included. I did it in order to do a genetic disease screening as I was very sick with strange symptoms for a long time.


Thanks to you, and others, for sharing. I hadn't yet resorted to looking in the consumer space. In research (and presumably clinical)-land, the costs are substantially higher.

I'll be looking into it further to figure out whether there is some tradeoff here, or if it is just typical cost bloat for medicine/academia.


I haven’t done very thorough research, but my impression is that there is a considerable cost bloat if you go through hospitals and similar, but that the genome sequencing is essentially the same.


Not OP. Last year, Dante Labs ran a deal around the Rare disease day (Feb 28). After submitting my documentation, I got 30x WGS for €260.


As an alternative to step 2, you could also score the filtered vcf file (this is key to get rid of most of the initially millions of variants) with a variant prioritization framework like CADD (cadd.gs.washington.edu) to immediately highlight 'interesting' variants. Highlighting certain genes can definetly help, though it is not always conclusive

(I do not want to demotivate but please be cautious. Even the best analysts have 'only' around 50% case-solve rate. If it is an adult-onset-disease, chances can be lower as the disease mechanism in that case may not be 'consequential' enough to be naturally selected against)


Great guy spotted above :)

Love your offer to help.


I do not want to be negative, but are you sure this is best use of your time? Hackers always see problem to chase, but real life may work differently. I still remember how HN folks, were trying to design ventilators out if common parts, that would kill 100% patients.

Genetics is very hard, and in very good case you may get 10% correlation. Then you will have to convince specialists to chase this weak possibility...

Working on this will consume your time, and put you under stress. This energy could be spend on your partner instead. You will need a lot of energy it it progresses.

Also there is always a big chance of misdiagnosis. Simple stuff like food alergy can be mistaken for many illnesses. Perhaps best first action is to verify this diagnosis. Get second opinion. Or change environment to rule out common triggers.


I for one commend OP for having a defiant heart.


Plus HN already cured one disease: https://news.ycombinator.com/item?id=8050106


Not quite cured, so much as shed light on.


Never hurts to ask and try (as long as not blindly).


This was also my first thought. While it's never the intention, there's real risk of making things about yourself/the project instead of your partner. Not because you rank one as more important, just because it's a more "known" thing that feels more comfortable.

Nothing wrong with research...if there are existing tools within reach (and it seems like there are) then it seems like it'd be interesting and possibly helpful to dive in. But I'd strongly encourage you to timebox it as a project so it doesn't grow into something unhealthy.


I think time with a partner is better spent on cheerful things than revolving around their disease. I would feel uncomfortable at some point if everything in my life is now about the disease.


Matt Might chronicled experiences with his son's rare genetic disease. Perhaps that will give you some ideas.

Hunting down my son's killer: https://matt.might.net/articles/my-sons-killer/

You may also want to email him. Anecdotally, I believe the rare disease research community is small and willing to listen to outliers.

University department page: https://www.uab.edu/medicine/pmi/matt-might

(Edit: fixed urls, typos, grouping.)


There was another really good one on HN a while back about a husband doing research to try to save his wife from FFI.

This was it and seems relevant: http://www.cureffi.org/2019/04/29/financial-modeling-in-rare...


Also this woman who identified her own disease, her father and a champion athlete https://www.propublica.org/article/muscular-dystrophy-patien...


Seriously, 43 points 6 years ago??? Resubmit this already.


That was an excellent read.


Yeah, I think your best option is to identify the protein containing the mutation. Then contact experts studying that protein. Most researchers love to hear from outsiders


If it was me, I wouldn’t be willing to go into in-depths discussions with patients since that could lead to unintended consequences. Like patient self-medicating based on what I say, even if I didn’t intend to give any advice.


I'm doing the exact same thing, although with my own genome. Things I have done

- Annotate the variants with its frequency. You can do that with Ensembl. There's an official Docker image on their website that I suggest you use. You will have to run "./INSTALL.pl" to download some files (they call them caches) and then "vep -i /genome.vcf --af --max_af --af_1kg --af_esp --af_gnomad -o genome_with_freqs.vcf --cache".

- If you know a specific region of the genome you want to look at, you can use tabix to extract all the variants in it. For example: "tabix genome.vcf.gz chr16:82,624,969-83,802,640 > cdh13_variants" will extract all the variants in that specific region.

- Use igv to browse the variants visually.

In general, the tools you will come across aren't very intuitive and its CLI interfaces aren't good. Prepare to spend a decent amount of time to make sense of everything. Honestly, if you know of a family member of her who happens to have similar symptoms, I would say the best thing you could do is to sequence that genome and see where they overlap. In any case, read some studies, find the loci where the variants cause problems, and then extract the variants.

EDIT: On this last point: the cool thing about igv is that you can open vcf files and see only the variants; you can search for variants ("rs123456" in the search field) and it will show the variant and its surroundings; you can search for "chr16:82,624,969-83,802,640" and it will limit what's visible to that region; you can search for a gene and the search field will show you the region of the genome that the gene spans, which you can use for tabix later. You will often see in studies something like "variants in loci p13.12 of chromosome 16 were shown to have an effect in...". Right below the search bar you can see all those locis (p13.3, p13.2, etc.). Good luck! If you add your email in your profile, I will contact you.


Hello, thank you for the offer to contact me. I do have a couple of questions that it seems like you might be able to help with. My email is in my profile.


Biology and hence medicine is complex. You can not grok it in a day. You can not synthesize most drugs at home and you can not buy them without a license. There are therapies, side effects, treatments for side effects, interactions, ... It may be impossible to become an expert and find a cure in time for your partner. This is not your fault.

I strongly recommend that you start by consulting with multiple doctors until you find one or more who shows interest in your partner's case, understands probable causes, and demonstrates useful expertise in treatment. You may need to visit a larger hospital that is involved in research. Work the system. You are a recruiter.

Pick the primary doctor who will coordinate your partner's treatment. This doctor will be your partner's primary line of defense, and they will mentor you in any investigations you do. They will guide you through the maze of therapies, palliative care, research, social workers, and other forms of support. They will connect you to other specialists who can help.

Good luck!


Wise reply. Even simple Mendelian genetics is complex. OP would love a therapeutic intervention. That will require working collaboratively with motivated health care experts.


Not directly an answer to your question, but if you're not already aware of it, another thing that might be worth pursuing in parallel is having your doctor refer you to the Undiagnosed Disease Network, if the case meets the criteria for that [1].

Another avenue might be a crowdsourced rare disease research organization like [2]

I have no relation to any of the above but read a book about the UDN that may be of interest to you: [3]

[1] https://undiagnosed.hms.harvard.edu/apply/

[2] https://www.researchtothepeople.org/

[3] https://www.goodreads.com/en/book/show/53317420-the-genome-o...


Stepping back a bit from the specific question further, a PCP isn’t the best partner for a rare disease as you’ve noted as well. As you seeing in responses, there are many people willing to help that are well informed. There is typically researchers, associations, and niche communities that will help one keep up with the latest research. If you share more specific information, I or someone else may be able to help direct you towards resources.



As a medicinal chemist (obviously biased), I would focus more on treatment and alleviation than the genetic component. This would usually mean looking at the metabolism or how the genetic mutation alters it. Is your partner lacking an enzyme, certain metabolites or intermediates etc.? These can at times be supplimented either directly or indirectly through food or vitamins/co-factors.


You may want to add an email address to your HN profile. Or have some method of contact listed.

I do something very similar in a research lab, and while it’s possible to make decent headway in this without much training, there are dragons all over the place.

For example, you didn’t mention which reference genome was used for the alignment/variant calling. Unless you use the right version for annotation, you’ll just get junk annotations that won’t make sense. I’ve only seen a couple of comments mention this.

If you don’t have the background, you might also need a crash course in human genetics, inheritance, molecular biology and variant functional prediction. You don’t need to become an expert, but you will need a working knowledge so that you know which variants to ignore. There really should be a small handful that would potentially make sense as a causal variant.

If the condition is sufficiently rare, you may not find clinical annotations, so be prepared to look a little deeper.

Best of luck.


Bioinformatician here. What are you hoping to find? If you just want to search for variants, and your VCF file is germline variants and annotated appropriately, yeah that will work. You may need to annotate the VCF.

As far as discovering new associations or causal relationships from a single WGS, you are probably not going to have any luck there.


Specific plug for Will Byrd's mediKanren and Matt Might's group in general they are the real deal. (source: I have worked for other deals).

For more just start learning the tools... I have not checked in on them for years now but "BioStars Handbook" was up & coming

[] https://github.com/webyrd/mediKanren [] https://biostar.myshopify.com/


If the disease is genetic (error in a gene) or genomic (error in the large scale structure of the genome), a very important thing to do would be to also sequence the parents (preferably WGS [Whole Genome Sequencing]). Doing such a trio sequencing experiment allows you to determine, for every gene and even base pair, from what parent it came and this allows you to assess what is normal (assuming the parents are not affected). This way you may find out if your partner is homozygous for an aberration both parents are heterozygous for. This could mean that is THE affected gene/chromosome. You may also find what we call "de novo" aberrations, these are aberrations that are not present in the parents at all and arose in you partner. These aberrations are very suspect (again, because both parents are not affected).

You may find something, you may not, a lot is unknown and regions outside of the genes may be affected and even the cause of the phenotype, but we still understand very little of this.

Depending on where you live, genomic counseling is free and trio sequencing is usually part of it.

This is not really my expertise (more in oncology) but feel free to ask more questions.

If you have BAM (or CRAM + reference genome) files for parents and your partner, you could download a trial of VarSeq [0] to do a more GUI based analysis of the results.

"Is it sufficient to identify variants by searching for the SNP string (e.g. "rsXXXXXX") in the VCF file?" If the variants have been associated with the same phenotype as your partner's, then yes, it is interesting. If there is no phenotype, perhaps you can track down the source publication and try to talk to the authors.

There are probably groups online with people in the same situation, try to find them, they can probably help you a lot more.

[0]: https://www.goldenhelix.com/products/VarSeq/index.html


Maybe to add, from a personal perspective, try not to loose yourself in this. I felt the same when my mother was diagnosed with cancer... I was the expert, I needed to save her. But my mother needed to have faith in the system and no second guessing. She trusted her oncologist. I backed off, they got her to the point of being cancer free for 3 years now against all odds.

Admittedly, your situation is different. Still, your partner may need you more as a supporting, fun, optimistic person rather than the miserable piece of human you can become from a bottomless rabbit hole like genomics, where the answer to your partner's problem may forever seem like "almost within your grasp".


A final thing to add, I didn't see anyone here mention trio WGS, which, I think is the way to go for unknown hereditary conditions. Perhaps an expert can confirm, but it will narrow your search by a huge factor.


Other folks have mentioned CureFFI, but I would like to second that suggestion. This is a couple that has committed their lives to understanding and curing a rare condition that personally affects one of them.

https://www.cureffi.org/about/

Sonia: https://www.broadinstitute.org/bios/sonia-vallabh

Eric: https://www.broadinstitute.org/bios/eric-minikel

They are also hiring: https://broadinstitute.wd1.myworkdayjobs.com/broad_institute...


Feel free to reach out, username at berkeley edu.

In open source bioinformatics we strive for reproducible science, which can be difficult in a field with tons of different methods and tools. One approach is to use a workflow language such as Nextflow [0] and Docker/Singularity such that the entire analysis is reproducible, see e.g. [1].

There is a vibrant community around Nextflow workflows called nf-core [2] which has a rare disease workflow in development [3], come join our slack!

[0] - https://nextflow.io

[1] - https://github.com/brentp/rare-disease-wf

[2] - https://nf-co.re

[3] - https://nf-co.re/raredisease


Promethease [1] is a great resource

[1] https://www.promethease.com/


Hi all, appreciate the input! I may respond to specific comments but to address some general comments/questions:

- Of course I want to spend time with my partner and don't see this as the "way to fix everything". - The literature surrounding my partner's disease calls it a "rare disease", but the number of patients in the US are in the 10's of thousands. I'm not trying to find a new associated gene/SNP with the disease, just reference against what research has been done by others. - The diagnosis is FSGS, and there is a history (since childhood) of high cholesterol.


If there is literature,two places to start would be pubmed ( pubmed.ncbi.nlm.nih.gov) and Google scholar. You may also search for communities where people with the disease hang out-- subreddits, forums, Facebook groups, etc-- and learn about lifestyle and common problems from them. If you know what it is, that just makes it a matter of figuring what type of info it is you desire, and optimizing your search for that. Scihub would also be a good place to look.

One way to easily find good research papers about it, is to go to the Wikipedia page and looking at the citations. They tend to cite the larger summaries, and they'll sometimes be available on pubmed-- if not you may need to look at scihub for the full text. That's a good introduction to the research paper side of information

One thing I've also heard is if your email the author on the research paper, they'll often send it to your for free. While distribution is through expensive journals, researchers can and often want to distribute them freely to intetested parties. Just another alley to search


maybe call around and try to find a genetic counselor who would be willing to work with a patient who has the resources and capability to contribute to research efforts.

would recommend trying to find supervision from an expert rather than just diving in alone. every field has its nuance.


We don't have to do everything ourselves.

Maybe you could find researchers working on his topic (your disease or the generic problem of identifying causal mutations for a disease) and pay them to work on your case?

You might fight your own parking ticket but you'd get a lawyer for your murder defence...

Computer people get paid a hella lot more than university medical research people, especially those outside big cities or in Europe or Asia. It's more efficient to work hard at making money and hire a few experts.


You can do this. Yes. I assure you.

I'm in a not too dissimilar position, although a better known one.

It wasn't clear to me until recently what an astounding amount of good scientific results exist out there that are accessible on a device that's probably in your hands right now.

It's also been a surprise how useful Twitter is. Find good scientists that have real responsibilities toward the truth, are doing sound research, and talking on twitter to try to get the dialogue going. This kind of person is a very very useful link to the extant knowledge. And there's A LOT of it.

Some subreddits are also surprisingly deep.

It's a question of separating the wheat from the chaff. But it's possible. It is possible. You can do this.


Unfortunately, this is very difficult to do, because in addition to the mutation that is causing the disease, your partner has hundreds of other mutations that are just part of the normal background. Based on my understanding of cases where the genetic basis of a rare disease is discovered, the best thing you can do is to find a physician who has studied others with a similar pathology. It's hard to learn much from one patient, but as soon as you have a few more, things can progress much more rapidly. The tricky part is finding that physician. There are some medical centers that specialize in rare genetic diseases; you might start there.


You can use algorithms to prioritize the variants in the VCF file based on how damaging they are predicted to be, for example VAAST http://www.yandell-lab.org/software/VAAST/VAAST_Quick-Start-... or Exomiser. There are ways to bias the search using phenotypes that you know of. The issue with using existing databases is that you will never find a novel variant that may be causative. There are companies that provide this service (e.g. Fabric Genomics) and they may be willing to help.


A few notes to help you in this journey.

As for the data, I assume you've done Illumina sequencing. Your files are as follows:

FASTQ: short reads of the genome. CRAM: the reads aligned against a reference genome. VCF: small (probably <50bp, mostly SNPs) variants between your partner's genome and the reference, including your partner's genotypes (We are diploid, so there are two homologous copies of almost all loci in the genome, so you can have a variant that's in homozygosis---same alleles-- or heterozygosis--- different alleles.) The other files are index files that trivially describe the layout of these.

A substantial fraction of rare genetic disease (maybe 20%) relates to alleles found in the exome (the portion of the genome that directly codes for proteins in a 1:1 manner). You can look for rare variants that have significant effect on proteins. In most Illumina data sets, the significant majority of these will be genotyping or variant detection errors. Even ones that seem to lie in genes that are important for the etiology of your partner's phenotype are likely to be errors.

Other posters have linked to tools that might you predict the effect of given variants. You might also look at the variant effect predictor (VEP): https://grch37.ensembl.org/info/docs/tools/vep/index.html. This will classify the predicted effects of variants based on extremely detailed annotations of the genome. You can then find variants with high effect that are rare or nonexistent in the observed human population (using gnomad). Rare variants of highly deleterious functional effect with allele frequency >0% that your partner has in homozygosis may be candidates to follow up on. You will also want to look for variants that have AF=0% in the larger population and high effect size and your partner has in heterozygosis (they could be "dominant").

My impression is that most rare genetic disease is related to structural variation. This lies outside the scope of the short read resequencing which you've done. We don't even yet know the magnitude of this, because there are so few truly de novo assemblies of rare disease patients. The required technology has only come online in the past two years.

Between problems of observation of the genome and interpretation of the significance of variants, your job is not going to be easy. You will be confused by the signals you get, and probably follow many incorrect leads. Good luck.


> look at the data and relay any potential links/causes to the providers.

I’d imagine this will annoy doctors just as much as patient starting the consultation with “so I was Googling”


Here's a post I made on reddit about how to do exactly this:

https://www.reddit.com/r/Nebulagenomics/comments/nhjfpa/how_...

You use the VCF and a java project called the Exomiser, and it will give you output files with all the pathogenic variants marked

In my case and is the case with a lot of rare diseases you could have unique pathology and mutations in a certain gene but that don't show up as pathogenic in clin var. For example my family has a lot of autoimmune diseases and as expected my HLA genes are totally trashed. However none of these mutations have ever been seen and flagged before especially was WGS is so new.

If you only have a list of genes and the genomizer will give you a list of the genes that are the most heaviy affected, you can put them into this app to get some further data and idea about what kind of tissue expression or rare disease spectrums you may be dealing with: https://maayanlab.cloud/Enrichr/

you can make informed decisions on it like for example I have a defect in my thiamine transport gene, so now I follow a b1 megadose protocol. A lot of people do that with the basic 23and me methylation reports but this is more in depth. So in my family maybe this looks like parkinsonanism, autism, diabetes, muscle disease, metaboloic syndrome, but we're understanding these diseases to be more like mitochondrial diseases that are more systemic. The answers you get are often really just too cutting edge for GPs or even specialists to deal with and you have better luck just researching, biohacking or talking to a natropath. Genetics doesn't really have that place in general practice yet unless it's something like a very very clear pathogenic marker which honestly isn't the case in a lot of cases, or alternatively you end up having a "pathogenic" marker that we had no idea even exists in people who aren't gravely disabled. For example I don't have lissencephaly regardless of what my pathogeniticty says. instead in that case you look at the gene, and see the big picture which that it's linked to neurodevelopmental disorders, and I have autism so that could be a factor there. But autism != lissencephaly. WGS is so new

sadly the reality is though you can have all that and it almost puts you at a disadvatnage with doctors because you look crazy and sus claiming you have some HLA mutation or whatever. Who told you that? Oh well I data mined it...uh huh sure....honestly to get it back into the medical system and to be taken seriously you'd probably have to get a doctor to retest it, for example I can can spin this up to get my HLA alleles from my fastq https://github.com/nf-core/hlatyping

But no doctor is going to put that in my medical record until I convince them to run a blood test for the same damn thing.

if anyone wants to help me with my own genetic search woes and help me out or know solutions please let me know. if you want to help me publish or add to that guide somewhere let me know - i asked nebula if they wanted to print it on the blog and they said the'd be interested but I just never cleaned it up


> honestly to get it back into the medical system and to be taken seriously you'd probably have to get a doctor to retest it

The thing is, if you sequence your genome, you might have used a protocol that has a label on it: “Not for diagnostic purposes, research only“. A doctor shouldn’t take anything that’s not IVD-certified seriously(and for good reasons).


Given that the last 50 years have seen an explosion of chronic conditions (inflammation linked for the most part, which isn't nec obvious) tied to unknown environmental causes, I'd start examining those, including occult motion sickness due to sensory conflict (tv panning zooming passive transportation etc.) Even if there's a genetic trait, chances are the environment is involved. Don't neglect pubmed.


I don't know much about the field but your post reminded me of this interview of Martine Rothblatt on the Tim Ferriss Show - https://tim.blog/2020/12/17/martine-rothblatt-transcript/

Her story might give you some pointers. All the best to you and your partner.


In parallel to your own research, consider enrolling in a study like IDIOM at Scripps Health: https://www.scripps.edu/science-and-medicine/translational-i...


The global locus outside of the USA for rare disease research is Orphanet https://www.orpha.net/consor/cgi-bin/index.php I would start there to find people and companies working on your partner's disease. No idea if there's a US equivalent.


OP, besides other good recommendations in the thread, you can give Moon [https://www.diploid.com/moon] a shot. In my WGS, it found an SNP with allele frequency of 0.00089 that is designated as "probably_damaging" and "deleterious" in gnomAD.


Which condition is this?


If you believe it a rare disease and your doctor confirmed so, one website you can try is https://rarediseases.info.nih.gov/. It might or might not be genetic or familial-related.


To add to what others have said, If they have blood relations that are willing to get sequenced you may be able to narrow things down by like looking for a recessive gene in both parents not shared by any siblings or otherwise looking at inheritance patterns.


Some great advice here already. It is a little hard to be more specific without knowing what the disease is.

An important first step is to consider the probability that this is a condition with a genetic basis, based on what is already known about it.


I'm working as a researcher/bioinformatician looking for the genetic causes of rare diseases. Your chances of success are ... complicated. However, there are various things you can do.

I'll second the suggestion to use Exomiser, or its more expansive version called Genomiser.

No, it is not sufficient to look up the RS numbers in the VCF file. There are two reasons for this:

1. The RS number just refers to the location. Different variants can exist at one location, so you aren't necessarily finding the same variant. Variants need to be matched by location and by the change that they cause.

2. RS numbers are typically given to locations that have common variants, although there are numerous exceptions. It is a universal rule of genetics that a rare monogenic disease cannot be caused by a common variant. This fact was so obvious, but it needed to be published [0] before people started taking it seriously. So mostly likely the variant that is causing the disease does not have an RS number.

The main problem you will face is the sheer quantity of data that you have been given. The average person has something like 3 million variants, so you need a way to whittle these down to a short list. The first thing you need to do is get rid of all the common variants, for the reason stated above. The easiest way to do this is to annotate the variants using software like VEP, Annovar, or alamut-batch. I'd recommend VEP because it is good, popular, and free. That will include in its output whether the variant has been found in the GnomAD project [1], which is a conglomeration of thousands of genome sequences, and can therefore say whether the variant is common or rare. For the variant to be considered rare, it shouldn't be present in GnomAD more than a couple of times.

Once you have the variants annotated, you should know for each variant whether it is inside a gene, which gene that is, and whether the variant has an effect on coding. If a variant is intronic, it is unlikely to be pathogenic (although it is never that simple). Common mechanisms of pathogenicity are:

1. If the variant changes the protein code (a missense variant). These are hard to interpret - they may be pathogenic but most are not.

2. If the variant changes the length of the coding DNA by a factor of three (an in-frame indel), which inserts/deletes amino acids from the protein. These are slightly more likely to be pathogenic than missense variants, but most are still not.

3. If the variant changes the length of the coding DNA by something other than a factor of three (a frameshift indel). This messes up the frame of the three-base code of the gene, making the rest of the gene gibberish. These are much more likely to be pathogenic, but only if the gene itself is actually important.

4. If the variant changes a protein codon into a "stop" codon (a "stop gain" or "nonsense" variant). These are as likely to be pathogenic as a frameshift.

5. If the variant interferes with splicing (a splicing variant). These variants are on the borders of the exons of genes and may change the way that the introns are cut out of the gene before translation into protein. These are fairly likely to be pathogenic.

The annotations should tell you which of these things a variant might be. A synonymous or intronic variant that doesn't affect splicing is very unlikely to be relevant.

You need to determine whether the disease is likely to be recessive or dominant. Recessive means that you need to have both copies of the gene broken in order to get the disease, whereas dominant means that you need just one copy broken in order to get the disease. If you look the disease or gene in ClinVar or OMIM [2] you can often find whether the gene is recessive or dominant. If it is recessive, you either need to find two pathogenic variants that are heterozygous, or you need to find a single pathogenic variant that is homozygous. In the VCF file, a variant is heterozygous if it says "0/1" and homozygous if it says "1/1".

By far the easiest way to narrow down the extremely long list of variants is to do an inheritance analysis. If you are able to perform genome sequencing on both of the patient's parents, then you have more power. Namely, any variant that is heterozygous in one of the parents can't be causing a dominant condition in the patient if the parent is healthy. Any variant that is homozygous in one of the parents can't be causing a recessive condition in the patient if the parent is healthy. So, immediately reject any variant that is homozygous in one or both of the parents. Next, identify the variants that are only in the patient and not the parents. These are "de novo" variants - they arose in the patient as a copying error from the parent's DNA. A large proportion of rare genetic diseases are caused by de novo variants.

Other types of inheritance are:

1. Compound heterozygous - in this case one parent has one variant, and the other parent has the other variant, both in the same gene, and the patient has inherited both of them.

2. Homozygous - if both parents have the same heterozygous variant.

3. X-linked - if the patient is male, he has only one copy of the X chromosome. The mother may have a heterozygous variant on the X chromosome and be fine because of her second working copy of the gene, but pass the broken copy on to the patient. The father must not have this variant and be healthy.

There are more.

If you think you have found the causative variant(s), then you need to go through a process of proving it. The problem is that we have so many variants that if you look at the whole genome, you will find something, even in a healthy person. When we analyse someone in our lab with the parents available, we will typically produce a list of 20 genes that have some convincing arrangement of variants. The first hurdle that they need to pass is whether the gene is associated with the correct disease at all. If something is convincing, then a good guide to proving it is the ACMG guidelines [3]. These show how much evidence is required to classify a variant as pathogenic, and how to assemble that evidence.

Be very careful as a non-geneticist. Because we have so many variants, it is very easy to pick one and believe that it is the cause. The prior probability is that it isn't, unless you can gather significant evidence that it is. Early genetics studies tended to assume that if something was found, it must be the cause, and we are now paying for that. My lab recently published a paper refuting the association of some genes with a disease, because those associations were made back when standards were not as high and we did not have access to the population databases like GnomAD that we have now, and they were just wrong. If you think you have found the cause, then you will absolutely need to get it checked by someone qualified.

I wish you the very best of luck.

  [0] https://www.nature.com/articles/gim201726
  [1] https://gnomad.broadinstitute.org/
  [2] https://www.omim.org/
  [3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4544753/


> My lab recently published a paper refuting the association of some genes with a disease

Link?


If you want to try an alternative way, please search Dr. Andy Lee. Good Luck!


Try something different.


Once you have identified the genes, a look at your partners diet & supplement regime will help because chemicals will alter gene functions.

https://reset.me/story/epigenetics-how-you-can-change-your-g...


As someone who has been trying to keep himself alive, it's definitely a battle.


"If the wildest fantasies of cryptocurrency enthusiasts were to come true, if all the environmental and technical objections were to fall away, the result would be financial capitalism with all the brakes taken off."

This is sensationalism at best, and out right misinformation at it's worst. Come on, the crypto crowd isn't calling for abolishing regulation and creating pure financial anarchy. The general sentiment is we want more transparency so that innovation can continue and "Web3" ideas can take shape.

I agree that we continue to over speculate into bubbles, and many people lose each time. I hope that most people are told and understand the serious risks involved. I'm not naive enough to think this is the case; some people are destined to lose their money and crypto makes it much easier. But the true innovation of blockchain still stands, and that is we've found a way to trust each other through bits instead of people, and that's a damn big deal.

These articles are good to reflect inherit risks to investing, but bad in that they paint an image of the evil crypto doers out to kill off all financial rule. That's not the case.


Quick question - this comment looks like it was aimed at another thread, or narrowly, another comment in this thread. Or did Arc just glitch?


Yeah, this is very strange. I've never even read this thread, and I (think) I remember seeing this on the correct page at one point.

Maybe a bug somewhere? My comment was in response to a thread linking this article: https://www.watershed.co.uk/studio/news/2021/12/03/case-agai...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: