Meet's Thoughts

January 7th, 2025

Why automata?

When I talk about my research in neural cellular automata for training neural networks, usually the people that understand the gist of it respond with some skepticism whether it would lead to any better outcome than either of the following, more mainstream paths:

Just scaling language models up, either at training or test time
Making language models work more with each other, or the whole "agentic systems" pathway

To be honest, I'm not so sure those are worse ways than mine, they seem like they could work really well for a lot of things, and that's why there's a lot of attention on those ways right now.

I'm not one to pretend that I've fully thought through all possible reasons why I do what I do. I kind of just do what I think is right intuitively. Intuition for me usually comes with a bunch of reasoning that's associated with it, but it really isn't too clear to me which came first: my feeling or the reasons. I feel that both come from some prior cause, some abstract processing I'm doing in my own head without words or even intention. Just baseline thought progression.

It is actually by analogy with the way I feel my own processing happen that I'm going towards this automata idea. I want a neural network that thinks. And I don't need it to think in a "chain-of-thought" kind of way. I need it to think on its own, some process that just grows. I guess I need the neural network to breathe. An ambient state of being, some internal learning without any input or output expected immediately.

Of course, it does need dependence on data. It needs some interaction with something outside itself, and that can be provided. But later on, parts of the network can communicate with itself, providing the network with its own "training". A kind of digesting can happen here. Maybe Geoff Hinton would call this a sleep state, like when he talks about his forward-forward algorithm. I think something like it just needs to happen continuously all the time.

In order to have this internal communication, internal teaching kind of thing, I need the network components to be somewhat independent from each other. That's where the locality of updates can help: part of the network can be a student, and the other part the teacher, and some kind of signaling can happen between them locally. Also, the "student" and "teacher" parts aren't discrete, since they are both made up of many weights and hidden states that could overlap and change constantly.

My guess is that the approaches with LLMs are not really amenable to this kind of view of thought. The compressed chain-of-thought work kind of gets closer to this style, but even if it's "abstract reasoning" it's still not changing the weights of the network itself. I think, at some point, you can't just have a huge context window of tokens (discrete or otherwise; the language model alone or with friends). You're gonna have to learn something, you're going to need the ability of the model to change its weights.

Aug. 11th, 2024

Telescoping evolution and technology

Evolution is a process that has resulted in telescoping versions of itself manifesting at different scales and with different entities. (Shout out to Waking Life). Biological evolution over millions of years begets anthropological evolution over thousands of years to create civilizations that beget cultural evolution within and between civilizations. There are more examples over time as these telescoping evolutionary processes get faster and faster -- some processes are completed and restarted within our own lifetimes. Technological evolution starting from before the industrial revolution continues into the information age, giving rise to digital evolution through the internet, giving birth to hundreds of tiny evolutionary processes on social media, and now more recently giving birth to evolutions of language models (Evolutionary Optimization of Model Merging Recipes) that create possibilities to evolve generators of both new ideas and new cultural trends. The way the singularity happens is naturally through an evolution of AI generating algorithms which are themselves evolutionary processes.

Oct. 25th, 2023

Some stream-of-consciousness thoughts about consciousness in AI

A little late, but I noticed a lot of people been talking more about consciousness in AI since language models got better, like that engineer from Google making a fuss about their LM being conscious, and other people. Also, remembering a lot of discussions after Neurips last year when David Chalmers gave a keynote about what are the necessary conditions for consciousness and possible ways to test for some of those, and the others we don't really have an ability to test at this time.

Regardless of the "utility" of conscious AI, I think it's really interesting. Probably one of the most interesting things I could think of is exploring what makes our consciousness actually work and how to make one (other than having babies I suppose...different kind of interesting). However, it is one topic that really viscerally scares me. And no, not in the existential risk type of way that some people are focused on, for some good and some iffy reasons. I don't think most of those reasons particularly require consciousness anyway.

What I'm afraid of: Actually understanding the building blocks of consciousness feels like it would change my personal experience of life somehow. Like if we knew something about how conscious organisms experience time, and that we're able to change that. Or, that it's possible to construct situations where all outside interactions with the conscious entity seem like it's "one" entity but in fact is made of several conscious entities (maybe some having a worse time than others!). What if we just have an additional silent observer consciousness with us sharing our brains who "has no mouth and must scream". Or beyond this, many people's assumptions of an entirely material world might be disproven. We have a situation where we know if you change a brain, you can change the conscious experience of someone. But no matter how much physical info you have about a person's brain, you don't know what they're actually experiencing, because consciousness is inherently, definitionally subjective, and that's the hard problem of consciousness. I am of the view that the hard problem of consciousness is indeed a problem, that philosophical zombies are conceivable, etc. and haven't seen an argument yet to convince me otherwise.

So then we have a causal connection from the "real (material) universe" to what goes on in somebody's conscious experience. We have the "real world", and we got the whole "consciousness world" where people's qualia live I guess. That world: what rules govern it? Does it have rules? What can change? Different people have different experiences -- is that just because of different functional components? Many questions along these lines and probably more relevant ones undoubtedly have been asked by people a lot more deep into {the philosophical literature, spirituality, theology, neuroscience, etc.} than me.

But anyway, does the connection go in one direction? Are we so sure? And are the connections one-to-one? Is it possible to change someone's consciousness by somehow effecting a change in the "consciousness world" without necessarily affecting anything in the material world in the normal way to cause a change in consciousness? Probably not I guess but...why?

Anyway thanks for reading my blog :)

Protein ML Research

Thoughts about proteins as of Nov. 22, 2022

Working on a project using large-scale protein interaction data for protein design. Also thinking about collective intelligence in deep learning. Models like neural cellular automata are interesting: the system's behavior captures some compositionality and is robust to perturbations. A happy face needs eyes and a mouth; you can mess them up and end up with a long mouth and one eye or have two faces competing for space. Why not have two magnesium binding regions on a protein? I watched this a little too much

Houston, we got prions

Previous thoughts about proteins

Thoughts about proteins as of Dec. 8th, 2021

I'm working on a project to predict protein function descriptions in natural language, and focusing on evaluating the functional descriptions in an automated way. I'm trying to choose a good metric for this, starting my search with BLEU and other measures used in machine translation, and measures mentioned in this paper. If you know a lot about such measures, contact me! For this problem, compared to machine translation, there are differences in the assumptions of what constitutes a good match for a pair of descriptions, and how do we score a set of descriptions with a set of functions for a particular protein.

Prior thoughts

I've been working on function/fold/class discovery for proteins recently. I'm thinking about neural network-based clustering algorithms, though I know there are possibly much better ways to approach class discovery for proteins (probabilistic programming, energy based models). I want to learn more about those better approaches, but I still think it's worth exploring adapting the new techniques developed for unsupervised image classification for proteins just to see how they'd do.

Work surrounding mine

Some related work that I think is interesting.

Vilnis, Luke, and Andrew McCallum. "Word representations via gaussian embedding." arXiv preprint arXiv:1412.6623 (2014).
Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
Grover, Aditya, and Jure Leskovec. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
Radford, Alec, et al. "Learning transferable visual models from natural language supervision." arXiv preprint arXiv:2103.00020 (2021)
Van Gansbeke, Wouter, et al. "Learning to classify images without labels." arXiv preprint arXiv:2005.12320 (2020).
Caron, Mathilde, et al. "Deep clustering for unsupervised learning of visual features." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
Singh, Rohit, Jinbo Xu, and Bonnie Berger. "Global alignment of multiple protein interaction networks with application to functional orthology detection." Proceedings of the National Academy of Sciences Sep 2008, 105 (35) 12763-12768; DOI: 10.1073/pnas.0806627105
Ashburner, Michael, et al. "Gene ontology: tool for the unification of biology." Nature genetics 25.1 (2000): 25-29.
Zhou, N., Jiang, Y., Bergquist, T.R. et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol 20, 244 (2019). https://doi.org/10.1186/s13059-019-1835-8

IsoRank (Singh et al. 2008)

IsoRank is a global network alignment algorithm that can be used to detect functionally similar proteins between two interaction networks. It involves two main steps:

Solving an eigenvalue equation to compute a functional similarity score matrix R between all pairs of proteins between the two networks
Extracting a set of high-scoring and mutually consistent matches from the R matrix.

We used the first scoring step in NetQuilt, because the similarity profiles that IsoRank computes are pretty informative features for function prediction across multiple species.

SCAN (Semantic clustering by Adopting Nearest neighbors) (Van Gansbeke et al. 2020)

SCAN is a neural network-based clustering algorithm that has been used to classify images in an unsupervised way. This one involves three main steps:

Self-supervised learning of image features using a neural network, giving a k-nearest neighbors graph in this learned feature space
Train a softmax layer on top of the embeddings of the previous model using a semantic clustering loss function which enforces neighboring samples in the k-nearest neighbors graph to be in the same class
Additional training using pseudo-labels extracted from the model with strongly-augmented images in order to refine and increase confidence in predictions

I'm currently exploring how this could work with protein sequence, because like most other biological data collected, most of it is unlabeled, and needs to be categorized.

Back to main page