Usage
Some basic tips on how to use CIViCmine
- You can filter the results using the panel on this left side of the Browse page. This panel allows you to filter by evidence type, gene, cancer type, drug name, variant type and whether it already exists in the CIViC database.
- To deselect a gene, cancer type or drug, click the dropdown box, press Backspace and click away or press Escape. Do not press Enter. Unfortunately you cannot select an empty option. This is a reported issue with Shiny.
- You can then click on a row in the table to bring up the associated citations in the table at the bottom. This table includes the PubMed ID with link, journal information, section within publication (title/abstract/article) and the actual sentence
- You can also click on a gene, cancer type or drug shown in the pie chart to jump straight to biomarkers involving your selection. Your choice will be shown in the dropdown boxes on the left
- The matching with CIViC only takes into account the evidence type, gene, cancer and drug (if applicable).
- The gene names are from HUGO, the cancer types are from the Disease Ontology and the drug names are from WikiData
- The system tries to normalize gene, cancer and drug names to those ontologies (e.g. HER2 -> erbb2, ESCC -> esophagus squamous cell carcinoma and AZD9291 -> osimertinib).
- Sometimes it can't find one specific case so it lists all the possibilities separated by a semi-colon. So p75 gets mapped to "CUX1;HCLS1;PSIP1;SIGLEC7;TNFRSF1B" as it is very ambiguous.
- It also tries to detect possible fusions or combinations. These are separated by the pipe '|'. So you may see 'BCR|ABL1'
- The system tries to understand the meaning of the sentence. We've dialed up the precision so it should make as few false positives as possible, but there will be a few mistakes.
- The table of citations includes a link to the PubMed citation, the journal and publication year, the section of the paper (title, abstract or article) and the associated sentence.
- It will be updated once a month (roughly on the 1st) with the latest publications
FAQs
Q: When I type in a gene, cancer or drug, it isn't autocompleted. Why is that?
A: If a gene or cancer doesn't come up, it means that it isn't in CIViCmine. Check that the gene/cancer name is the standard name (e.g. ERBB2 not HER2) in the associated ontology (HUGO, the Disease Ontology and WikiData for drug names).
Q: What text is being mined?
A: We are processing the entirety of Pubmed (~22 million abstracts) and Pubmed Central Open Access Subset (~1 million full text articles).
Q: Are you just using co-occurrences?
A: No. Co-occurrences count how many times a gene and a cancer-type appear in the same sentence. We use the Kindred relation extraction package to understand the context of the sentence and only extract sentences that discuss a cancer biomarker with high likelihood.
Q: How regularly is this updated?
A: CIViCmine is updated monthly (around the 1st of the month) and makes use of the PubRunner package to make it easier and minimise the amount of additional computation on new abstracts and papers in Pubmed and Pubmed Central Open Access subset.
Q: I found a sentence that doesn't say what CIViCmine describes. Why?
A: CIViCmine is completely automated with no human curation so there will be some mistakes. We've adjusted the precision to make as few false positive mistakes (with a tradeoff of likely higher false negatives). All information offered by CIViCmine should be interpreted accordingly.
Q: Could this approach be used to extract other types of biomedical knowledge?
A: It probably could. If you'd like to talk, please get in contact with Jake Lever.
Q: Can I bulk download the data?
A: You can bulk download it from our Zenodo repository or use the download buttons on this website to get a subset of the data. All data is Creative Commons Zero licensed.
Q: Where can I read more about the methods and full results?
A: This work has been published in Genome Medicine. Please cite this paper you make use of the data.