Distributed representation: phenotypes and genes as inputs and features

Posted 6 February 2016 at 06:23 at age 27.

In reading the introduction to the Deep Learning book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, I came across the concept “distributed representation.” This idea struck me as parallel to the depiction of genetics in Richard Dawkins’s 1976 book The Selfish Gene.

Figure 2 from: Asgari E, Mofrad MRK (2015) ContinuousDistributed Representation of Biological Sequencesfor Deep Proteomics and Genomics. PLoS ONE 10 (11): e0141287.doi:10.1371/journal.pone.0141287" class="mt-image-none" height="602

Distributed representation apparently arose during the “connectionism movement” in the 1980s, and means:

Each input to a system should be represented by many features.
Each feature should be involved in the representation of many possible inputs.

The given example is how distributed representation allows modeling nine vehicles of three different types, three colors each, using only six neurons: one for each color and one for each vehicle type. This is better than nine neurons each dedicated to a specific type-color combination. It’s easy to see how this is much improved, especially with larger numbers of inputs (or features? – I’m a little fuzzy still).

Throughout the introduction, the authors point out how deep learning modeling parallels biological neurons and how the brain works, but they emphasize deep learning marches on with little input from neurology. The current lack of investigative tools into brain function undoubtedly plays a role here.

Another parallel that compelled me to write this down for later reflection is the description of distributed representation matches my understanding of genes and phenotypes. I don’t think this is anything new, for we’ve known for a while some genes have multiple effects and some effects depend on multiple genes, and there’s much interdependence.

It’s interesting the structure of genes and their expressed phenotypes is similar to the structure of neurons and their computed outputs. Perhaps the general computation algorithms behind neurons are also at work in our DNA, though we don’t recognize genes as performing computation. But what else could it be? DNA sequences are clearly data. They just happen to be data that are also molecules that happen to spontaneously cause other molecules to build proteins and more that leads to a biological end result. DNA could be thought of as data that initiates its own computational evolution. It builds a computer (proteins, cells, tissues, a whole body) that does work in the universe and ends up creating new DNA, generally through reproduction, perpetuating the cycle.

So, what does that mean?

Well, my knowledge in all these areas is limited, but I want to learn more! Even though billions of years of evolution resulted in a system where the data (DNA) computes on itself, and we often find nature’s solutions are optimal, I doubt the ultimate machine learning situation would involve data creating its own computational framework in quite the same way. But who knows. This all does have me thinking about the idea the universe itself might be a computer, which I think I first read about in Ray Kurzweil’s The Singularity is Near. If all of life is based on DNA, and evolution could be looked at as DNA computation, then all of life could be a computer. I think the universe-computer idea was more along the lines of physics and quantum particles, but maybe that just means biology is a bunch of virtual machines running on a quantum computer that is the universe. (I swear I am totally sober right now.)

I did a quick search for "distributed representation" genetics to see if this is some well discussed area, but those results were overrun with information about genetic computation, which is another super fascinating area I hope to explore some day. I did find a slightly more closely related and recently published paper, “Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics” by Ehsaneddin Asgari and Mohammad R. K. Mofrad of University of California, Berkeley. The paper describes using deep learning to analyze protein sequences, and their “results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined.” It seems pretty cool, but it doesn’t seem to discuss “genes as features,” as I was hoping to find.

Stuff like this makes me question what I’m doing with my life. Every time I read something in any branch of science or computing, I find myself asking questions that seem to require doctoral level research to answer. But even if I did pursue more school, I can’t do it in all directions at once!

Who am I?

I am me! I’m also a scientist minded software engineer who loves reading, running, listening to music, and recording photos and videos and data of all sorts. After earning a biochemistry degree, I lived in San Francisco and Tokyo, and now I find it difficult to stay put. Read more about me and my online life.

The Campaign Trail

SF: 87% complete; Oakland: 27%

San Francisco Bay Area running progress

Charlie says: “What a wonderful region!”

30 November 2022
In-N-Out Burger (Fast Food Restaurant)

333 Jefferson St , San Francisco , CA

Charlie says: “It was super hot and we got a taste for salty fries, but by the time we walked there it dropped 35° and was cold. Still tasted good!”

28 September 2020 at 20:32
Taqueria Zorro (Mexican Restaurant)

308 Columbus Ave , San Francisco , CA

Charlie says: “Restaurants are hoppin’ around here, feels weird.”

26 September 2020 at 19:42
碼頭老火鍋 (Hotpot Restaurant)

仁愛路四段409-1號 , Da’an District , T’ai-pei Shih

Charlie says: “Delicious spicy hot pot with Harry. I am so full!!”

25 March 2020 at 08:40
桶好呷滷味 (Asian Restaurant)

, Taipei

Charlie says: “We pick a representative set of ingredients and they build out the rest into a braised soup like thing over noodles.”

23 March 2020 at 06:46
Addiction Aquatic Development (上引水產) (Fish Market)

民族東路410巷2弄18號 , Taipei

Charlie says: “Standing sushi bar at a fish market.”

21 March 2020 at 07:03
ACME Breakfast CLUB (Breakfast Spot)

3F., No. 10, Ln. 27, Chengdu Rd., , Taipei

Charlie says: “Brunch w/ Shawn! Was tempted to get the avocado toast kind of as a joke since I never get it in SF, but resisted, sourdough was good. :-)”

20 March 2020 at 22:27
三甲和風創意料理 (Japanese Restaurant)

Charlie says: “Late dinner with Shawn, at a lovely place!”

20 March 2020 at 09:28
中央藝文公園 Central Culture Park (Park)

北平東路與紹興北街口 , Taipei

Charlie says: “Social distance.”

20 March 2020 at 03:42
虎頭山環保公園 (Scenic Lookout)

Charlie says: “Exploring the hillside in Taoyuan City.”

14 March 2020 at 23:49
Abura-Ya (Japanese Restaurant)

362 17th St , Oakland , CA

Charlie says: “Dinner with Beam before Sarah McLachlan!”

24 February 2020 at 18:55
Ramen Yamadaya (Ramen Restaurant)

1728 Buchanan St , San Francisco , CA

Charlie says: “Dinner with John and Alan”

04 January 2020 at 19:19
Taraval Okazu Ya Restaurant (Sushi Restaurant)

1735 Taraval St , San Francisco , CA

Charlie says: “Dinner with Alan and Emre”

28 December 2019 at 21:59
Tselogs (Filipino Restaurant)

11B San Pedro Rd , Daly City , CA

Charlie says: “John wanted to take me to a Filipino place. It was a quiet night but good food!”

30 November 2019 at 17:43
Golden Gate Bridge (Bridge)

Golden Gate Brg S , San Francisco , CA

Charlie says: “Visiting the bridge with Jay, whom I have not seen in years. Time flies when you don’t slow it down.”

16 November 2019 at 16:22
Buckhorn Grill (BBQ Joint)

619 Market St , San Francisco , CA

Charlie says: “Dinner with Beam! And needed somewhere I can pull out my laptop since I'm on call today and it's been a bit crazy.”

19 September 2019 at 18:54
Tank Hill Park (Park)

Clarendon Ave , San Francisco , CA

23 June 2019 at 19:26
Cafe Bavaria (German Restaurant)

7700 Harwood Ave , Wauwatosa , WI

Charlie says: “Nice puffy pot pie dinner with Tim and Mom”

19 June 2019 at 16:50
Spring Shabu Shabu (Hotpot Restaurant)

, Boston , MA

Charlie says: “Delicious last night in Boston!”

14 June 2019 at 19:14
Taiyaki NYC - Boston (Ice Cream Shop)

119 Seaport Blvd Ste B , Boston , MA

Charlie says: “Post team lunch snack.”

12 June 2019 at 10:11
Aceituna Grill (Mediterranean Restaurant)

57 Boston Wharf Rd , Boston , MA

Charlie says: “Falafel plate with tabbouleh and moussaka”

11 June 2019 at 09:35
Twin Peaks Summit (Hill)

100 Christmas Tree Point Rd , San Francisco , CA

Charlie says: “#walkSF to work day!”

10 April 2019 at 08:51