Rosetta-Commons-logo-360h

European RosettaCon on Nov 11-13, 2024

Day
Hour
Minute
Second

European RosettaCon Program

Day 1 - Monday, November 11th, 2024

Welcome

8:30 - 9:00

          Registration, Coffee + Croissants

9:00 - 9:10

          Opening Remarks

Session I - Enzymes

09:10 - 10:35

          Donald Hilvert: Designing artificial enzymes de novo

Enzyme design represents a formidable challenge. We do not fully understand the rules of protein folding, and our knowledge of structure-function relationships in these macromolecules is at best incomplete. Recent progress in combining computational and evolutionary approaches for the design of artificial metalloenzymes will be discussed, together with insights into enzyme function gained from studies of the engineered catalysts.
Talk length: 25 minutes; estimated time: 9:10 am

          Ramiro Illanes Vicioso: Biophysical characterization and Catalysis of AI-designed carbonic anhydrases

Our project focuses on the ability of ZymCTRL, a conditional language model trained on the enzyme sequence space, to generate stable and active proteins with catalytic performance close to those found in Nature. We were able to isolate seven carbonic anhydrases designed by ZymCTRL with an aminoacidic sequence identity below 50%, and define their oligomeric state, purity, stability and catalytic yield. In the end, the seven candidates proved to be monomeric (SEC-MALS), folded (CD, SAXS) and active (Wilbur-Anderson assay), accomplishing catalytic perfomances in line with their natural counterparts, such as the carbonic anhydrase from E. coli. The obtained results evidence ZymCTRL's capabilities at generating proficient enzymes without further training of rounds of optimization.
Talk length: 10 minutes; estimated time: 9:35 am

          Abbie Lear: Understanding evolutionary improvements in designer enzymes to inform computational design

In this study we investigate the catalysis of the de novo designed Kemp eliminase 1A53-2, which has been improved 1,000-fold by directed evolution, to determine guidelines for future enzyme design. We calculate the reaction barrier for a range of enzyme conformations using QM/MM simulations and the adaptive string method, reproducing activity trends between variants. Analysis of transition state structures reveals features that determine activity and could be leveraged for future design.
Talk length: 10 minutes; estimated time: 9:45 am

          Núria Mimbrero Pelegrí: REXzyme: A Translation Machine for the Design of Enzymes Tailored to Specific Reactions

REXzyme is an innovative enzyme design approach inspired by natural language processing and chemical and protein language models. It uses the T5 encoder-decoder model to translate desired chemical reactions into amino acid sequences. Trained on a dataset of 21 million non-redundant reactions-enzyme sequence pairs, REXzyme generates biocatalysts with natural-like properties: globular, ordered, and whose predicted functionalities match their intended catalytic reactions. Its applications span from biomedicine to environmental sciences.
Talk length: 10 minutes; estimated time: 9:55 am

          Sigrid Kaltenbrunner: Crafting Custom Catalysts: Designed Enzymes for the Morita-Baylis-Hillman Reaction

We report on the use of Riff-Diff, a modeling strategy for scaffolding catalytic arrays in de novo protein backbones to design an enzyme for the Morita-Baylis-Hillman reaction. Using Riff-Diff, 18 de novo sequences derived from 14 unique backbones were created. While the experimental characterization of activity is under investigation, we believe that the approach taken can straightforwardly be applied to different reactions, paving the way to fast solutions to biotechnological challenges.Talk length: 10 minutes; estimated time: 10:05 am

          Georg Künze: Computational engineering of a metagenome-derived polyester hydrolase for efficient PET depolymerization

Current research focuses on the development of efficient polyester hydrolases for the recycling of polyester plastics such as Poly(ethylene terephthalate) (PET). The recently discovered Polyester Hydrolase Leipzig 7 (PHL7), isolated from a compost metagenome, completely degrades amorphous PET from post-consumer plastic waste in less than a day. However, its application at an industrial level has been hampered by its low stability and short lifetime at low salt concentrations and high temperatures, which are required for efficient PET hydrolysis. Here, we have applied a computer-aided design workflow to engineer PHL7 for improved stability and activity. Multiple mutations were predicted by Rosetta energy calculations and combined, increasing the thermal melting temperature to more than 95°C. Subsequent iterative mutagenesis around the PET binding pocket yielded a PHL7 variant with a >120-fold increased activity and 12.2°C higher melting temperature in 0.1 M buffer. X-ray structure and MD simulation analyses revealed the mechanisms underlying the enzyme’s elevated activity and stability. The new PHL7 variant is better than ICCG in terms of its PET hydrolyzing activity and comparable to the recently reported PETases LCC-A2 and TurboPETase at 65°C. Our results constitute a significant advancement for the engineering of industrially applicable PET hydrolases.
Talk length: 10 minutes; estimated time: 10:15 am

          Cesar Antonio Ramirez-Sarmiento: Improving the activity of PETases and MHETases at varying temperatures using MD simulations and ThermoMPNN

PET hydrolases (PETases) and MHET hydrolases (MHETases) are exciting potential biocatalysts for the recycling of PET. Most known efficient PETases operate at high temperatures, whereas only a few mesophilic MHETases are known. The design of PETases that efficiently degrade PET at lower temperatures is desirable to enable whole-cell biocatalysis, whereas the design of thermostable MHETases could aid in enhancing PET degradation at high temperatures using PETase-MHETase mixtures.In this short talk, we will describe the discovery of a novel PETase found in an Antarctic organism, Mors1, that operates at 25 ºC. Using MD simulations to compare the flexibility of the active site between thermophilic and mesophilic PETases, we generated a Mors1 chimera by loop swapping with an extended loop from the thermophilic PETase LCC that has 5-fold more activity and an optimal temperature for activity at 45ºC.Reasoning that thermostable MHETases are required for generating synergistic PETase-MHETase systems to degrade PET at high temperature, we used ThermoMPNN to increase the thermal stability of a mesophilic MHETase. While the wild-type enzyme exhibits higher activity ~40 ºC, all tested single-point mutants generated by ThermoMPNN show higher activity at 50ºC, with one of such mutants exhibiting ~10-fold higher activity than the wild-type MHETase. Acknowledgements : ICGEB CRP/CHL23-02, Homeworld Collective Garden Grants, EMBO Global Investigator Network.
Talk length: 10 minutes; estimated time: 10:25 am

10:35 - 10:55 Coffee Break

10:55 - 11:30

          Giulia Peteani: Enhancing metalloprotein sequence prediction with synthetic data

Large language models (LLMs) show promise in protein modeling, yet they struggle with metalloproteins due to limited data. Thus, we propose a fixed backbone design to generate synthetic metalloprotein sequences, expanding the training set. Fine-tuning ProtGPT2 on these sequences, we developed models that better generate sequences likely to contain metal-binding sites while preserving quality. Our work shows the power of synthetic data augmentation for training LLMs on specific protein classes.
Talk length: 10 minutes; estimated time: 10:55 am

          Sarel Fleishmann: Computational design of functional repertoires of enzymes and antibodies

Abstract to be posted.
Talk length: 25 minutes; estimated time: 11:05 am

Session II - Intrinsically disordered proteins, condensates & MD simulations

11:30 - 12:05

          Kresten Lindorff-Larsen: Towards computational design of disordered proteins and condensates

Intrinsically disordered proteins and regions (collectively IDRs) and protein with long flexible regions are pervasive across proteomes, help shape biological functions, and are involved in numerous diseases. IDRs populate a diverse set of transiently formed structures yet defy commonly held sequence-structure-function relationships. Recent developments in structure prediction and design of folded protein have led to the ability to predict the three-dimensional structures of folded proteins at the proteome scale and to design sequences that fold into specific three-dimensional structures. In contrast, knowledge of the conformational properties of IDRs is scarce, in part because the sequences of disordered proteins are poorly conserved and because only few have been characterized experimentally. In my talk I will describe how we can combine molecular simulations and machine learning methods to study the relationship between sequence, conformational properties, and functions of IDRs.
First, I will describe how we have used experimental data on more than 100 different proteins to learn a coarse-grained molecular energy function (CALVADOS) to predict conformational properties of disordered proteins including their propensity to undergo phase separation. Second, I will briefly describe how CALVADOS makes it possible to perform large-scale simulations to learn the relationship between sequence, structure, and function of IDRs at the proteome level. Third, I will present work on how we can use the information encoded in CALVADOS to design disordered proteins with desired conformational properties. Finally, I will show how we have used molecular simulations to train a model to predict the relationship between protein sequence and propensities to undergo phase separation.
Talk length: 25 minutes; estimated time: 11:30 am

          Rasmus Krogh Norrild: mRNA-display based measurement of disordered protein phase separation through partitioning experiments

Biomolecular condensates have emerged as a new class of cellular membrane-less organelles involved in compartmentalisation, regulation, and signalling. Relying on a multitude of interactions, phase separation of dynamic and weakly interacting protein and RNA molecules contribute to assembly. These interactions are challenging to quantify by traditional methods limiting thermodynamic data available for predictive models. Here we present an mRNA-display-based approach to directly measure the energetic contributions of peptide and RNA molecules to the formation of biomolecular condensates, at a large scale. Using the intrinsically disordered region (IDR) of Dead-box helicase 4 (DDX4N1), which is a central germ-granule component, we measure partitioning coefficients of almost 100,000 peptides and corresponding mRNA molecules between the protein-rich and depleted phases. By using partitioning as a proxy for phase separation, we show that peptide fragments of DDX4N1 itself provide high-resolution data on regions of the protein domain responsible for homotypic condensate formation. Additionally, peptide tiles of all other known IDRs form a catalogue of potential clients of the condensates. The combined data informs on general properties of partitioning peptides which might extend to also rationalise phase separation of unrelated IDRs. RNA partitioning is disfavoured upon secondary structure formation, and we show that purine content promotes partitioning. The unprecedented scale of the data generated by this method allows quantitative evaluation of how condensation behaviour and potential specificity are encoded in protein and RNA.
Talk length: 10 minutes; estimated time: 11:55 am

          Alena Khmelinska: Tales of Dynamic Protein Assembly Design

Recent computational methods have been developed for designing novel protein assemblies with atomic-level accuracy, yet several aspects of current methods limit the structural and functional space that can be explored. I will share with you our ongoing efforts in diversifying the structural repertoire of protein assemblies and developing strategies to gain control over assembly dynamics.
Talk length: 25 minutes; estimated time: 12:05 pm

12:30 - 13:30 Lunch

13:30 - 14:30

          Matt O'Meara: Docking to Novel pocKets (DoNK): Charting Virtual Bioactivity Space Through Large-Scale Protein Design and Molecular Simulation

Large scale chemical foundation models will require synthesizing empirical and theoretical knowledge. To explore the utility of simulation data we have generated DoNK, dataset of 1,000,000 in-stock ligands docked to 10,000 designed binding sites. We will describe our experience with using it to pre-train ML models for a range of virtual screening and medicinal chemistry prediction tasks.
Talk length: 25 minutes; estimated time: 13:30 pm

          Mykhailo Girych: Quality of disordered protein ensembles in coarse-grained molecular dynamics simulations

Machine learning (ML) models trained on coarse-grained (CG) simulation data are gaining traction for predicting intrinsically disordered protein (IDP) ensembles. Here, we evaluate CG force fields (FFs) accuracy by comparing CG IDP ensembles to atomistic simulations and NMR spin relaxation data. Results show that while some CG FFs capture key IDP features, performance varies across different proteins. This highlights the need of CG FFs evaluation for reliable prediction of IDPs.
Talk length: 10 minutes; estimated time: 13:55 pm

          Joseph Rogers: Targeting disordered protein using de novo designed proteins

Human proteins, to a surprising degree, lack folded structure. Lacking structure does not mean lacking function: these regions are important for human biology and feature prominently in proteins associated with disease. However, these dynamic, disordered peptide chains are notoriously hard to bind, and there are few research tools or drug leads targeting these regions. I will describe our hallucination and RFDiffusion efforts to design de novo proteins that bind disordered regions, by inducing them to fold.
Talk length: 25 minutes; estimated time: 14:05 pm

14:30 - 14:50 Coffee Break

Session III - Small molecule interaction and design

14:50 - 16:20

          John Karanicolas: Using generative AI to design small molecules for drug discovery

We describe a new AI-based technique for producing novel chemical structures in silico. Through transfer learning, we demonstrate that the model can be iteratively refined to yield outputs that optimize for specific criteria, including 3D shapes. We applied the model to produce new chemical structures that recapitulate the biological activity of natural products, using much simpler readily-accessible chemical scaffolds.
Talk length: 25 minutes; estimated time: 14:50 pm

          Niklas Gesmar Madsen: Composed Message-Passing for GPCR drug-target prediction: Integrating Knowledge Graphs and Deep Learning.

Drug-target interaction (DTI) databases are vast but sparse, leaving many off-target and adverse drug effects unexplored. We developed a composed message-passing approach to robustly interpolate within this sparse data distribution for G protein-coupled receptors (GPCRs), starting at the molecular graph level and then extending to the chemical neighborhood level (also a graph). Finally, we correlated predictions with over 3,000 DTIs and validated 14 novel DTIs using a GPCR-yeast biosensing platform. The results demonstrate the inherent graph homophily in the data and likewise reconciles knowledge graphs and deep learning that operates on them.
Talk length: 10 minutes; estimated time: 15:15 pm

          Christoffer Norn: De novo design of GPCR modulators

G-protein coupled receptors (GPCRs) play a crucial role in various physiological processes and are major targets for drug discovery. Designing modulators that stabilize distinct functional states of G-protein coupled receptors (GPCRs) is challenging due to the small differences between these states and complex epitope shapes. We have developed novel metaproteome-based design methods and high-throughput screening techniques, resulting in the discovery of both agonists and antagonists.
Talk length: 10 minutes; estimated time: 15:25 pm

          Daniele Granata: Mega-scale in silico benchmark of de novo design tools for protein therapeutics

To support both our internal design tasks and benchmark new tools, we developed a highly scalable and modular platform for protein design. We performed a large-scale benchmark of state-of-the-art tools for structure-based de novo binder design, generating more than 25M designs in total. The benchmark includes a systematic exploration of therapeutically relevant protein lengths (15-95 aa), the impact of hotspot information and prediction of developability properties for the designed sequences.
Talk length: 10 minutes; estimated time: 15:35 pm

          Deniz Akpinaroglu: Structure-conditioned masked language models for protein sequence design generalize beyond the native sequence space

Machine learning has enabled significant progress in protein sequence design. Frame2seq is a structure-conditioned masked language model that achieves state-of-the-art accuracy. We demonstrate Frame2seq's ability to generalize beyond natural sequences by designing for novelty, including a design with 0% sequence identity to native. Further, we show that our model is uniquely useful for control over the conformational landscape of multi-state proteins. [+ workshop]
Talk length: 10 minutes; estimated time: 15:45 pm

          Rocco Moretti: Crowdsourcing Small Molecules: Evaluating Foldit as a platform for drug design

Foldit is an online citizen science game which allows members of the public to participate in computational structural biology research. We have recently added the ability to design druglike small molecules into the Foldit interface. Foldit game players have successfully participated in several different drug design programs, including CACHE, an independent assessment of computational small molecule design programs.
Talk length: 25 minutes; estimated time: 15:55 pm

Poster Flash Talks

16:20 - 17:00

Poster Session I

17:00 - 18:25

          Poster viewing: even poster numbers

Check poster numbers in the Conference Booklet

Posters and Pizza

18:25 - 20:00

         Dinner with poster viewing

Day 2 - Tuesday, November 12th, 2024

Session IV - ML & Protein Design I

9:00 - 10:25

          Jens Meiler: How artificial intelligence is reshaping protein structure prediction and therapeutic design – from small molecules to new modalities

AlphaFold is revolutionizing protein structure prediction. Not because of its increased accuracy in comparison to the best predictions of prior methods, but because of consistent accuracy across all folded, well-structured proteins. I will discuss some of the remaining challenges such as predicting all biologically relevant conformations of flexible membrane proteins including transporters, ion channels, or receptors.
With the availability of highly accurate structural models for most proteins, structure based drug discovery is experiencing a renaissance. The availability of large 'make-on-demand' compound libraries coupled with computational ultra-largelibrary screening fundamentally changes the paradigm in (academic) probe and drug development projects to 'in silico' first! I will introduce these concepts and detail several new algorithms to accomplish these tasks.
While AlphaFold is close to a golden bullet for protein structure prediction, computational design of protein and peptidetherapeutic candidates is much more challenging even with the use of artificial intelligence. One challenge is that the desired goal is function, the design algorithm focuses on structure implying that it might have the target function. A second challenge is the inclusion of chemical space with limited training data such as non-natural amino acids. I will give an overview of several new algorithms developed by us and others combined with illustrative applications.
Talk length: 25 minutes; estimated time: 9:00 am

          Dek Woolfson: From peptides to proteins to functions by design

It is now possible to generate many stable peptide assemblies and proteins from scratch using rational and computational approaches. One new challenge is to move past structures found in nature and target the ‘dark matter of protein space’; that is, structures that should be possible from chemistry and physics, but which biology seems to have overlooked. This talk will illustrate what is currently possible in this nascent field using de novo designed coiled-coil peptides and proteins.
I will describe our “toolkit” of de novo coiled-coil assemblies, and how we are converting these peptides bundles and barrels into single-chain proteins through rationally seeded computational protein design. Then I will turn to subcellular applications. I will describe two new designs for (i) de novo cell-penetrating peptides, and (ii) high-affinity kinesin-binding peptides, and how these can be combined to hijack and control active motor proteins in living cells.
Understanding a protein fold: The physics, chemistry, and biology of alpha-helical coiled coils
DN Woolfson
J Biol Chem 299, ARTN: 104579 (2023). DOI: 10.1016/j.jbc.2023.104579
Rationally seeded computational protein design of α-helical barrels
KI Albanese, R Petrenas, F Pirro, EA Naudin, U Borucu, WM Dawson, DA Scott, GJ Leggett, OD Weiner, TAA Oliver, DN Woolfson
Nat Chem Biol 20, 991-9 (2024). DOI: 10.1038/s41589-024-01642-0
De novo designed peptides for cellular delivery and subcellular localisation
GG Rhys, JA Cross, WM Dawson, HF Thompson, S Shanmugaratnam, NJ Savery, MP Dodding, B Höcker, DN Woolfson
Nat Chem Biol 18, 999- (2022). DOI: 10.1038/s41589-022-01076-6
A de novo designed coiled coil-based switch regulates the microtubule motor kinesin-1
JA Cross, WM Dawson, SR Shukla, JF Weijman, J Mantell, MP Dodding, DN Woolfson
Nat Chem Biol 20, 916-23 (2024). DOI: 10.1038/s41589-024-01640-2
Talk length: 25 minutes; estimated time: 9:25 am

          Sofia Andersson: De novo design of conformational changes using RFDiffusion and ProteinMPNN

RFDiffusion is used to design two structurally unique backbones, where one is a de novo template structure and the other is a redesigned version. This is done using an iterative process based on sequence similarity and structural difference. ProteinMPNN is then used to design sequences for both structures, and the amino acid probabilities are combined to create a sequence which encodes both structures. Two such sequences have been expressed experimentally and CD spectra and NMR show a reversible conformational change between two unique states.
Talk length: 10 minutes; estimated time: 9:50 am

          Elodie Laine: From sequences to fitness and motions, protein language models to the rescue?

I will present our latest work for addressing two important questions for protein engineering and human medicine. What is the impact of single-point mutations on protein functioning? How do protein move and deform to perform their functions? I will highlight the complementarity between protein language model (pLM-based predictors and evolutionary- or physics-based approaches. I will discuss some limitations linked to biases in the pLM representation spaces and the ground truth experimental data.
Talk length: 25 minutes; estimated time: 10:00 am

10:25 - 10:50 Coffee Break

Session V - Protein-Protein-Interactions & -Complexes

10:50 - 12:20

          Tina Perica: Functional crosstalk between EGF and insulin signalling

Every cell needs to simultaneously, and in a concerted manner, respond to many different signals. Signalling pathways are, however often studied in isolation from each other. At the same time, these pathways appear over-regulated, with seemingly redundant regulatory mechanisms. I will present our preliminary work on systematically identifying functional crosstalk between pathways with proteomics as well as our ideas on how to probe the role of feedback loops in maintaining this crosstalk.
Talk length: 25 minutes; estimated time: 10:50 am

          Martin Pačesa: BindCraft: one-shot design of functional protein binders

Protein-protein interactions (PPIs) are at the core of all key biological processes. However, the complexity of the structural features that determine PPIs makes their design challenging. We present BindCraft, an open-source and automated pipeline for de novo protein binder design with experimental success rates of 10-100%. BindCraft leverages the trained deep learning weights of AlphaFold2 to generate nanomolar binders without the need for high-throughput screening or experimental optimization, even in the absence of known binding sites. We successfully designed binders against a diverse set of challenging targets, including cell-surface receptors, common allergens, de novo designed proteins, and multi-domain nucleases, such as CRISPR-Cas9. We showcase their functional and therapeutic potential by demonstrating that designed binders can reduce IgE binding to birch allergen in patient-derived samples. This work represents a significant advancement towards a "one design-one binder" approach in computational design, with immense potential in therapeutics, diagnostics, and biotechnology.
Talk length: 10 minutes; estimated time: 11:15 am

          Valeriia Hatskovska: De novo design and characterization of bispecific cytokines with novel function

The undifferentiated hematopoietic and leukemic stem cells are difficult to target due to their low abundance and lack of unique cell surface markers. To target these cells using bivalent binders of more than one cytokine receptor can be an effective strategy. To achieve this, we apply de novo design of bivalent, single-domain binders, using Damietta protein design. Specifically, here we report proteins capable of associating two key hematopoietic receptors (IL-3Ra and G-CSFR), simultaneously. Upon experimental characterization of five design candidates, they were well-expressed and highly thermostable, and two candidates bound both targets at nanomolar affinities. Our results also showed one of our designs to be capable of associating both receptors simultaneously at nanomolar concentrations.
Talk length: 10 minutes; estimated time: 11:25 am

          Amijai Saragovi: Controlling semiconductor growth with structured de novo protein interfaces

Protein design now enables the precise arrangement of atoms on the nanometer length scales (nanometers) of inorganic crystal nuclei, opening up the possibility of templating semiconductor growth. We designed proteins presenting regularly repeating interfaces containing functional groups that organize ions and water molecules, and characterized their ability to bind to and promote nucleation of ZnO. Utilizing the scattering properties of ZnO nanoparticles, we developed a flow cytometry-based sorting methodology and identified thirteen proteins with ZnO binding interfaces. Three designs promoted ZnO nucleation under conditions where traditional ZnO-binding peptides and control proteins were ineffective. Incorporation of these interfaces into higher order assemblies allowed the organization of defined protein-ZnO composite nanoparticles. These findings demonstrate the potential of using protein design to modulate semiconductor growth and generate protein-semiconductor composite materials.
Talk length: 10 minutes; estimated time: 11:35 am

          Fabio Parmeggiani: Predicting protein-carbohydrate interactions

Protein-carbohydrate interactions are ubiquitous and fundamental for the function of proteoglycans. However, due to low affinity and similarity between carbohydrates, prediction of specificity, validation of docking models and design of protein binders are still poor. Moreover, successful machine learning tools, employed for protein structure prediction and design, require large amount of high-resolution data that are simply not available for protein-carbohydrate complexes.
In this work, we have developed a 3D graph neural network that process geometry and interactions of protein-sugar interfaces in three-dimension space, using limited and curated sets of high-resolution structures. The predictor takes advantage of sampling atomic features of carbohydrates, making possible to generalize its application also to sugars not included in the training set.
This tool allows us to rapidly classify structures and models of protein-carbohydrate complexes with high accuracy and efficiency, providing guidance to experimental testing of potential binders and design of novel carbohydrate binders.
Talk length: 10 minutes; estimated time: 11:45 am

          Ora Schueler-Furman: Different ways to bind, common way to model?

Recent advances in modeling and deep learning have made it possible to significantly improve the modeling of interactions, including those mediated by short motifs, at least those that form a defined structure. Many interactions however can be strong but still retain considerable entropy. How well can these be characterized using the protocols developed for and trained on interactions with defined structure? I will present several examples and discuss where we stand and where we can go.
Talk length: 25 minutes; estimated time: 11:55 am

12:20 - 13:20 Lunch

13:20 - 14:40

          Amy Keating: Prediction and design of protein-peptide interactions

Models trained on protein sequences and/or structures have led to exciting breakthroughs in computational structural biology. We are interested in how such models perform for protein-peptide interactions and how they might be further adapted for this specific task. I will discuss our work investigating where information comes from when using AlphaFold to dock peptides, and our latest results scoring and designing protein-peptide complexes.
Talk length: 25 minutes; estimated time: 13:20 pm

          Lee Schnaider: De novo design and engineering of large functional protein complexes for sequencing and sensing

Protein nanopores are large channel-forming complexes which have considerable potential, from single-model detection of small molecules to commercially successful nucleic acid sequencing applications. Sequencing using protein nanopores relies on measuring the ionic current through a nanometer-scale protein pore embedded in a membrane. The baseline current is governed by the physical dimensions, chemical characteristics, and stability of the protein complex.
The CsgG:CsgF protein-peptide pore is an 18-mer protein-peptide assembly (9xCsgG + 9xCsgF), which is part of the E. coli curli biogenesis system. Derivatives of this pore have been used for DNA sequencing, which places high demands on the structural stability and homogeneity of the complex. To increase the robustness of the pore we employed two methods (i) incorporating protein engineering with proximity labeling to design derivatives of CsgF-bearing sulfonyl fluorides, which react with CsgG in very high yield. While proximity labeling is primarily used analytically, we implemented it preparatively, to direct covalent bond formation between the subunits of this 280 kDa protein-peptide and covalently stabilize it. (ii) de novo design of highly stable protein-peptide nanopores with predetermined structural features, optimal channel conductance, and enhanced stability. Through this work, we were able to shed light on specific nanopore properties that govern membrane insertion propensity, signal uniformity and stability. Excitingly, derivatives of these designs are now in development at Oxford Nanopore Technologies. I would be very happy to present the methodologies that enabled these designs in an oral presentation.
Talk length: 10 minutes; estimated time: 13:45 pm

          Océane Follonier: Adapting and Assessing the Design of Protein-Protein Interactions for Therapeutic Applications

Structural insights into virus interactions with the host immune system are key to developing novel antivirals and vaccines. Despite the availability of advanced deep learning methods to generate novel proteins, accurately scoring and validating the functionality of these designs remains a significant challenge. We aim to identify the critical factors in evaluating protein designs to ensure they meet specific target properties, such as preserved antigenic interfaces, solubility and stability.
Talk length: 10 minutes; estimated time: 13:55 pm

          Alina Konstantinova: Designing cyclical oligomeric anchors for self-assembling protein fibers

The goal of our work is to design proteins capable of moving along the tracks made of self-assembling protein fibers. Here use a combination of deep learning methods, such as RFdiffusion, ProteinMPNN and AlphaFold2, to design novel C10 oligomers that would rigidly connect to the ends of fibers and serve as anchors. The designs were ranked based on the combination of Rosetta and AlphaFold2’s metrics. We present the experimental characterization (including cryoEM) of two rounds of design.
Talk length: 10 minutes; estimated time: 14:05 pm

          Tanja Kortemme: "Expanding sequence & structure space, conformational switches, and synthetic cellular signaling"

I will discuss our recent progress in deep-learning methods for de novo protein design and their applications to engineering new protein architectures, dynamic proteins, and constructing cellular signaling from the ground up.
Talk length: 25 minutes; estimated time: 14:15 pm

14:40 - 15:00 Coffee Break

Session VI - Antibodies

15:00 - 16:25

          Clara Schoeder: Establishiment of a computational vaccine design pipeline for pandemic preparedness

Computational protein design has become a standard technology to stabilize viral glycoproteins in their prefusion conformation - the conformation necessary to elicit neutralizing and protective antibody responses. In this project, we want to take this approach one step further and systematically implement protocols and test their performance on a given prefusion stabilization task with experimental validation. Head-to-head we compared protocols from Rosetta and AI-driven sequence design.
Talk length: 25 minutes; estimated time: 15:00 pm

          Pietro Sormanni: Computational Strategies for Antibody Design and Developability Optimization

Antibodies are indispensable in research, diagnostics, and as therapeutics. Despite significant advancements in antibody discovery and optimization technologies, challenges remain, particularly in the efficient targeting of predetermined epitopes and the simultaneous optimization of multiple biophysical traits. Traditional screening methods can be labor-intensive and are ineffective at navigating the complex trade-offs between critical properties such as affinity, stability, and solubility. Computational approaches present a promising solution, offering speed, cost-effectiveness, and resource efficiency. In this presentation, I will explore emerging computational methods for antibody design, which enable precise targeting of specific epitopes, accurate prediction of nativeness, nanobody humanization, and the optimization of developability potential through the simultaneous enhancement of multiple biophysical properties. These approaches hold the potential to significantly streamline antibody discovery and optimization, paving the way for rapid advancements in therapeutic and diagnostic applications.
Talk length: 25 minutes; estimated time: 15:25 pm

          Britnie Carpentier: Incorporating Energy Calculations into AI Antibody Structure Prediction

Due to the role of antibodies in the immune system and their specificity when binding to antigens, predicting antibody structure plays a crucial role in the design of effective therapeutics. Experimental techniques to test various structures are typically costly, creating a need for cost effective methods to predict antibody structure and their potential biophysical properties prior to experimental testing. In recent years, machine learning methods have been at the forefront of antibody structure prediction, however, not without their limitations. Energy calculations are fundamentally important in finding and understanding the most stable structures and conformations of proteins. Most machine learning structure prediction models do not include energy calculations in their training as they can be inaccurate and chemically implausible, and datasets are limited. The Rosetta Energy Approximation Network (REAN) is a new neural network that approximates the Rosetta Energy Function (REF15) energy terms to a fair degree of accuracy. I propose to integrate REAN into IgFold, a sequence-to-structure prediction model for antibodies, for training and inference, by including the energy approximation into the Invariant Point Attention (IPA) module, a component that IgFold borrowed from AlphaFold2. I will test whether incorporating energy calculations into the training for structure prediction will improve the accuracy of the structure prediction itself, particularly in the hypervariable CDR3 regions. In the long term, I will also improve the energy approximating network and investigate how it can provide insight into the structure’s biophysical properties, such as, binding affinity and stability.
Talk length: 10 minutes; estimated time: 15:50 pm

          Possu Huang: A general platform for targeting MHC-II antigens via a single loop

Class-II major histocompatibility complexes (MHC-IIs) are central to the communications between CD4+ T cells and antigen presenting cells (APCs), but intrinsic structural features associated with MHC-II make it difficult to develop a general targeting system with high affinity and antigen specificity. Here, we introduce a protein platform, Targeted Recognition of Antigen-MHC Complex Reporter for MHC-II (TRACeR-II), to enable the rapid development of peptide-specific MHC-II binders.
Talk length: 25 minutes; estimated time: 16:00 pm

Poster Flash Talks

16:25 - 17:00

Poster Session II

17:00 - 18:25

          Poster viewing: odd poster numbers

Check poster numbers in the Conference Booklet

Gala Dinner

19:00 - 22:00

         Celebratory Dinner

Day 3 - Wednesday, November 13th, 2024

Session VII - ML & Protein Design II

9:00 - 10:10

          Noelia Ferruz: Design of functional enzymes with conditional language models

We report the training of conditional language models for the generation of proteins in unseen regions of the protein space. We test experimentally carbonic anhydrases and lactate dehydrogenases that share little similarity with natural proteins yet preserve their activity levels. We additionally describe two new models for the design of new-to-nature enzymes and binders and their dependency on data quality. We introduce techniques for the continual learning of protein language models.
Talk length: 25 minutes; estimated time: 9:00 am

          Matteo Cagiada: Predicting absolute protein folding stability using generative models

While there has been substantial progress in our ability to predict changes in protein stability due to amino acid substitutions, progress has been slow in methods to predict the absolute stability of a protein. In outrwork, we showed how a generative model for protein sequence can be leveraged to predict absolute protein stability. We benchmarked our predictions across a broad set of proteins and find a mean error of 1.5~kcal/mol and a correlation coefficient of 0.7 for the absolute stability across a range of small--medium sized proteins up to ca. 150 amino acid residues. We analysed current limitations and future directions including how such model may be useful for predicting conformational free energies. Our approach is simple to use and freely available via an online implementation.
Talk length: 10 minutes; estimated time: 9:25 am

          Max Beining: HyperMPNN ‒ A general strategy to design thermostable proteins learned from hyperthermophilic organisms

Deep learning protein design approaches like ProteinMPNN have shown strong performance both in creating novel proteins or stabilizing existing ones, with stability being a key factor to enable the use of recombinant proteins in therapeutic or biotechnological applications. Nevertheless, the resulting stability of designs is unlikely to surpass significantly that of natural proteins in the training set, which tend to be only marginally stable. Here, we collected predicted protein structures from hyperthermophilic organisms, which differ significantly in their amino acid composition from mesophilic organisms. We show that ProteinMPNN fails at recovering this unique amino acid composition and subsequently retrained the network on hyperthermophilic proteins. The result, termed HyperMPNN, not only recovers this unique amino acid composition but can also be applied to non-hyperthermophilic proteins. Next, we experimentally verified our approach by stabilizing existing proteins. In conclusion, we created a new way to design highly thermostable proteins through self-supervised learning on data from hyperthermophilic organisms.
Talk length: 10 minutes; estimated time: 9:35 am

          Mohammed Alquraishi: Some observations on how AlphaFold predicts structures and how it learns to predict structure

In this talk I will discuss some recent evidence regarding the degree to which AlphaFold appears to learn to do implicit physics, as well as how this knowledge is acquired during the training process. Time permitting I will also discuss differences between AlphaFold 2 and 3 with regards to the question of implicit physical knowledge.
Talk length: 15 minutes; estimated time: 9:45 am

10:10 - 10:30 Coffee Break

10:30 - 12:00

          Joanna Slusky: Protein Design Insights from Large Datasets

Protein design relies on a deep understanding of the mimicked protein category. By constructing a large dataset of outer membrane proteins, we discovered features of outer membrane protein biogenesis and evolution. With a second large dataset—of metalloproteins—we revealed key differences between metal binding sites that can and can’t catalyze reactions.
Talk length: 25 minutes; estimated time: 10:30 am

          Dominique Fastus: Synonymous codon selection bias and its impact on evolutionary co-translational protein folding

Computational analysis to understand how certain motifs in codons and protein structures relate are limited, as these two levels have been mostly separated in existing in silico studies. Here we obtained nucleotide sequences for a large-scale analysis on codon usage bias in correlation to secondary structures based on AlphaFold predictions. We also studied the conservation of rare codons of different protein families and investigated patterns of certain motifs and synonymous codons.
Talk length: 10 minutes; estimated time: 10:55 am

          Thea Klarsø Schulze: Learning sequence-abundance relationships across proteins from large-scale mutagenesis datasets

Accumulation of data from multiplexed assays of variant effects (MAVEs) has enabled large-scale analyses of variant effects in proteins and created the opportunity to train supervised models directly against experimental data to learn sequence-function relationships, including how missense variants affect protein stability and abundance. We have used data obtained by variant abundance by massively parallel sequencing (VAMP-seq), a MAVE technique that quantifies the steady-state cellular abundance of protein variants, to create a model for predicting the impact of missense variation on cellular abundance across proteins. We used data reporting on the effects of ca. 32,000 missense variants on the abundance in six proteins as training data for a deep learning model. The model consists of (i) an inverse folding model taking protein structure as input, (ii) a supervised model for effects on protein stability and abundance, (iii) a function mapping stability and abundance effects to folding probability, and (iv) a downstream model that describes the experimental process. Our model predicts variant abundance with state-of-the-art accuracy.
Talk length: 10 minutes; estimated time: 11:05 am

          Katharina Bachschwöller: Design of a potentially novel FAD synthetase by fragment-based chimeragenesis

The chimeric protein TyrAFld was constructed, comprising an ATP-binding fragment derived from the Rossmann fold domain of HiTyrA recombined with a fragment from the FMN-binding flavodoxin-like domain of DgFld, to catalzye the formation of FAD. After initial solubility problems and sequence optimization with PROSS and ProteinMPNN, the designed TyrAFld variants are expressed soluble and in high yields. Biochemical characterization indicates correct folding, high thermal stability and ATP binding.
Talk length: 10 minutes; estimated time: 11:15 am

          Andrea Hunklinger: Protein Design with Explainable Artificial Intelligence

The state-of-the-art protein language models (pLMs) excel at generating proficient proteins across diverse families, but they unfortunately operate as black-boxes. We apply explainable artificial intelligence (XAI) techniques like influence functions, feature attribution methods and the analysis of Transformer components to the field of enzyme design to enhance our understanding of the protein language and we use the insights to improve the models generation or for downstream prediction tasks.
Talk length: 10 minutes; estimated time: 11:25 am

          Sergey Ovchinnikov: Inverting Protein Structure Prediction models for protein design.

For this talk I'll describe some recent advances in inverting protein structure prediction models for protein design, contrasting them to more recent diffusion and flow based methods. I'll go through some examples of designing cyclic peptides, large new folds and protein binders.
Talk length: 25 minutes; estimated time: 11:35 am

12:00 - 12:20 Closing Remarks

12:20 - 13:15 Lunch

Session VIII - Parallel workshop sessions

13:15 -14:30

          Panel Discussion: Industry Careers

Panelists: Johanna Tiemann (Novonesis); Che Yang (Novo Nordisk); Alexandra Chivu (Flagship Pioneering); John Karanicolas (AbbVie); Jonathan Ziegler (Cradle); Dana Cortade (Align to Innovate); Marloes Arts (Genmab); Zander Harteveld (Orbis Medicines)
Moderator: Roland Pache (Novonesis)
Location: Lundbeck Auditorium

          From Lab to Docket: Supporting government policy development on chemical and biological AI models

Presented by: Samuel Curtis
Description: As biomolecular design capabilities improve, governments are working to understand how these computational advances relate to broader safety and security considerations. The U.S. government recently issued a Request for Information (RFI) on “Safety Considerations for Chemical and/or Biological Models” [Docket No. 240920-0247]. Responses will inform the development of biosecurity evaluations and mitigations in the U.S., and are likely to influence similar efforts internationally, including in Europe. In this participative workshop, we’ll walk through the process of providing policy input, with the goal of drafting a response to this RFI. We’ll discuss the RFI process, analyze questions contained in the RFI, and hone in on key messages that could positively shape the development of emerging policy frameworks affecting this domain of science.
Location: 4-0-24

          Sharing Protein AI Models Using Huggingface

Presented by: Simon Duerr
Description: Reproducing published computational workflows can be difficult because often even if the code is available it can be hard to figure out the exact environment and parameters to use. Webservers while practical often disappear from the web over time. This workshop will introduce methods how to quickly turn any kind of model pipeline into an easy to use webapp that other researchers can quickly deploy on their local machines and that can also be used directly in a web browser. The workshop will introduce you to the concept of a Docker container, the UI framework Gradio and how to deploy your model on HuggingFace Spaces. We will also demonstrate how one can interact with the deployed model programmatically by creating a PyMol extension using a REST Api.
Target audience: People with some familiarity with programming or bash scripting that want to share model pipelines on the web
Required hardware: Laptop with code editor installed and access to a Linux terminal or alternatively a web browser.
Location: 4-0-10

          Rosetta Data Bazaar Hack-a-thon (3 hours)

Led by: Matt O'Meara
Description: In teams of 2-3 participants we will collaboratively curate and make available biomolecular structure and activity datasets for the Rosetta HuggingFace (https://huggingface.co/RosettaCommons). Each team will select a published or in-house dataset developed or used by Rosetta/structural biology community and create a HuggingFace Dataset so that it can be loaded for machine learning with a single line of code. Depending on the dataset, curation may require some simple scripting, but participants are welcome to participate on a team even if they have limited coding experience. At least one lap-top is required per-team.
Location: 4-0-32

          Future of Rosetta

Presented by: James Vasile
Description: Ongoing work around the sustainability of Rosetta's community, revenue, and technical value proposition
Location: 4-0-13

14:30 - 14:50 Coffee Break

14:50 - 16:25

          Panel Discussion: Academic Careers

Panelists: Jens Meiler, Ora Schueler-Furman, Kresten Lindorff-Larsen, Amy Keating; and others
Location: Lundbeck Auditorium

          Innovating on the dataset-to-model design cycle for protein function prediction

Presented By:
Description: Align to Innovate creates living datasets for machine learning and deploys them via community benchmarking opportunities. If you took control of this cycle to generate better protein sequence-to-function prediction models, what kinds of protein libraries would you design to test? What published models would you want to see go head-to-head? How would you maximize the amount of information gleaned per variant tested (or per dollar spent on synthesis)? Join us to map out how the field should handle continuous evaluation and benchmarking of methods as more data comes into the ecosystem.
Location: 4-0-24

          Rosetta Data Bazaar Hack-a-thon (3 hours)

Led by: Matt O'Meara
Description: In teams of 2-3 participants we will collaboratively curate and make available biomolecular structure and activity datasets for the Rosetta HuggingFace (https://huggingface.co/RosettaCommons). Each team will select a published or in-house dataset developed or used by Rosetta/structural biology community and create a HuggingFace Dataset so that it can be loaded for machine learning with a single line of code. Depending on the dataset, curation may require some simple scripting, but participants are welcome to participate on a team even if they have limited coding experience. At least one lap-top is required per-team.
Location: 4-0-32

          Frame2seq: generalizable method for tuning multi-state conformational equilibria

Presented By: Deniz Akpinaroglu
Description: While recent advances in computational protein design have led to an increased experimental success rate designing stable single-state proteins, reliably tuning the dynamics of multi-state proteins remains a challenge. The workshop will demonstrate a generalizable method for tuning multi-state conformational equilibria using Frame2seq. We will show that model scores are consistently predictive of changes to the ratio of conformational populations. And we will use Frame2seq to identify mutation sites that will result in the most significant shits to conformational switch equilibria as computationally validated with AlphaFold2.
Location: 4-0-10

If you need help with registration please Contact Us.

Rosetta-Commons-logo-360h

RosettaCon is a place where Rosetta Commons member laboratories and invited guests share their latest experimental and computational research breakthroughs in macromolecular engineering and structure prediction.

For more information about Rosetta Commons please go to the Rosetta Commons Website