Introduction

Introduction

I hold a Ph.D. in Biology (2000; specialization in molecular genetics), augmented with considerable programmatic and computational expertise. The summary below provides a reasonably complete overview of me and my overlapping personal and professional pursuits.

I possess a unique and well-honed wealth of knowledge and experience, spanning

  • biochemistry; cell biology; metabolic pathways; cellular signaling pathways
  • molecular genetics and genomics, including cancer biology
  • bioinformatics
  • information retrieval and extraction

and more recently programming experience including, in descending order of expertise/experience:

  • Linux super-user
  • command-line, bash scripting expertise
  • Python; virtual environments
  • NLP: natural language processing
  • Apache Solr: high-performance document indexing/storage/search (backend), + user GUI (frontend): Carrot2 clustering engine; D3.js visualizations
  • R (“GNU S”/CRAN)
  • graphical knowledge stores/graphs (KG: Neo4j; Cypher query language)
  • some relational databasing (PostgreSQL)
  • some machine learning (computer vision; vector space models; …)
  • basic, “hands-on” familiarity with:
    • webdev (HTML/JS/CSS)
    • Java (some basic scripting; IntelliJ IDEA …).

I am a moderate contributor to:

Regarding machine learning (ML), for a period of ~1.5+ years (2015-2017) I immersed myself in that domain (arXiv; reddit ML subreddit; Y Combinator “Hacker News”; RSS feeds). I can follow the literature, install and debug the major platforms (Theano; Caffe; Torch7; TensorFlow; etc. I have posted contributions/solutions associated with various GitHub “Issues” and some StackOverflow.com questions, and I can clone and implement basic models as well as follow the recent literature (that is progressing at a staggering pace).

During that period I implemented various personal, self-taught ML projects, including:

  • an image captioning system
  • a webcam-based personal identification system: persons identified by name with faces identified by bounding boxes with the persons name, probability
  • a webcam-based classifier: backend: ImageNet top-five categories via a 50-layer residual neural network (ResNet-50), + web browser frontend
  • computer vision (webcam) age-gender classifier
  • other experiments …

That work was fun and rewarding, but my primary motivation in developing those skills and awareness was for supplemental, ML-based approaches to biomedical natural language processing (BioNLP) and bioinformatics; including:

  • classification:
    • RNN/LSTM (recurrent neural nets/long short-term memory-based models)
    • VSM (vector space models)
    • dimensionality reduction
  • knowledge discovery:
    • topic modeling
    • graph traversal
  • BioNLP:
    • biomedical named entity recognition (BioNER)
    • dependency parsing, applied e.g. to: relation extraction (to populate RDBMS and derived knowledge graphs; …)
    • semantic parsing, applied e.g. to question-answering; …
  • fact-checking; quality assurance (“noise” issues)

These tools and approaches support my personal and professional goals, summarized below:

  • information retrieval
  • information extraction
  • knowledge stores
  • quality assurance
  • knowledge discovery
  • biomodels

in support of my long-standing and overarching goal of facilitating a greater understanding of functional genomics: the phenotypic and functional expression of the information contained within our genomes.

I envision, ultimately, the creation of virtual networks (pathways; perhaps cells/tissues/organs), amenable to in silico perturbations and interventions for assessing changes in

  • metabolism
  • cellular signaling
  • cellular growth/death
  • pathogenesis

in response to simulated changes in

  • mutations; genomic alterations
  • epigenetic alterations
  • biochemical entities
  • cellular signalling pathways
  • environmental conditions (stressors)

that in turn could guide, for example:

  • personalized/and precision medicine (individualized susceptibilities; therapeutic interventions; …)
  • basic research: augmentation of “wet lab” experiments, via identification/ ranking of genomic “variants-of-interest” (SNPs); …
  • synthetic biology.

My stepwise approach in this regard has been to model biochemical and molecular biology/genomics data, first/do date as:

  • backend:
    • indexed literature (Solr)
    • BioNLP
    • extract high-quality data/relations to a RDBMS (PostgreSQL)
    • “on-the-fly” population of a graphical model (Neo4j) from those data, in response to user queries
  • frontend:
    • GUIs: Carrot2 clustering/visualization engine; D3.js visualizations; …

Subsequent stages could involve

  • dynamically linking those bioentities to biochemical, molecular genetic, and biomolecular databases and other data sources
  • constructing in silico models of metabolic and cellular signaling pathways, to aid personalized medicine and basic research …

I believe that all of these aims are fully tractable, and I have been working diligently toward their realization. I also believe those aims are also well-aligned with current research areas in biomedical literature processing, bioinformatics, molecular genetics, cancer genomics, pharma and personalized medicine that are of significant academic and commercial interest.

If you find my expertise relevant to this, or another position, please do not hesitate to contact me. I am also willing to offer my services as a Collaborator (academic; commercial). I have experience collaborating and working online, but I am also willing and able to relocate, as needed.

Dr. Victoria A. Stuart, Ph.D. Vancouver, B.C.

Miscellany:

My curriculum vitae (pdf)

E-mail me here