Data Sciences Platform / en New machine learning-based single-cell search engine makes cell annotation faster, more efficient /news/new-machine-learning-based-single-cell-search-engine-makes-cell-annotation-faster-more <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Mon, 28 Oct 2024 14:55:51 +0000 Corie Lok 5557581 at Q&A: How Terra became a backbone of public health pathogen surveillance /news/qa-how-terra-became-backbone-public-health-pathogen-surveillance <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Mon, 30 Sep 2024 18:08:10 +0000 adicorat 5557411 at Q&A: How generative AI could help accelerate biomedical research /news/qa-how-generative-ai-could-help-accelerate-biomedical-research <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Thu, 02 Nov 2023 14:30:22 +0000 Corie Lok 5555971 at Machine learning model finds genetic factors for heart disease /news/machine-learning-model-finds-genetic-factors-heart-disease <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Fri, 28 Apr 2023 14:00:07 +0000 tulrich@broadinstitute.org 1282131 at #WhyIScience Q&A: A machine learning engineer builds algorithms to improve clinical research /blog/whyiscience-qa-machine-learning-engineer-builds-algorithms-improve-clinical-research <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Tue, 06 Dec 2022 18:47:34 +0000 adicorat 1242901 at Ó³»­´«Ã½ and Microsoft collaborate to help accelerate disease research with scalable analytical tools /news/broad-institute-and-microsoft-collaborate-help-accelerate-disease-research-scalable-analytical <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Tue, 03 May 2022 17:41:36 +0000 kzusi@broadinstitute.org 1131946 at Ó³»­´«Ã½ granted FedRAMP authorization for Terra platform /news/broad-institute-granted-fedramp-authorization-terra-platform <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Mon, 24 May 2021 14:05:17 +0000 kzusi@broadinstitute.org 894666 at Ó³»­´«Ã½ and Verily partner with Microsoft to accelerate the next generation of the Terra platform for health and life science research /news/broad-institute-and-verily-partner-microsoft-accelerate-next-generation-terra-platform-health <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Mon, 11 Jan 2021 14:06:30 +0000 kzusi@broadinstitute.org 727341 at Q&A: Genomic epidemiology reveals patterns of SARS-CoV-2 transmission from greater Boston /blog/qa-genomic-epidemiology-reveals-patterns-sars-cov-2-transmission-greater-boston <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Mon, 21 Dec 2020 23:17:20 +0000 kzusi@broadinstitute.org 708556 at #WhyIScience Q&A: How I code to make a difference /blog/whyiscience-qa-how-i-code-make-difference <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> <span class="field field--name-uid field--type-entity-reference field--label-hidden"> <span>By Corie Lok</span> </span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" class="datetime">October 28, 2024</time> </span> <div class="hero-section container"> <div class="hero-section__row row"> <div class="hero-section__content hero-section__content_left col-6"> <div class="hero-section__breadcrumbs"> <div class="block block-system block-system-breadcrumb-block"> <nav class="breadcrumb" role="navigation" aria-labelledby="system-breadcrumb"> <h2 id="system-breadcrumb" class="visually-hidden">Breadcrumb</h2> <ol> <li> <a href="/">Home</a> </li> <li> <a href="/news">News</a> </li> </ol> </nav> </div> </div> <div class="hero-section__title"> <div class="block block-layout-builder block-field-blocknodelong-storytitle"> <span class="field field--name-title field--type-string field--label-hidden"><h1>New machine learning-based single-cell search engine makes cell annotation faster, more efficient</h1> </span> </div> </div> <div class="hero-section__description"> <div class="block block-layout-builder block-field-blocknodelong-storybody"> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a Q&amp;A, machine learning expert Mehrtash Babadi introduces Cell Annotation Service, a search engine for single-cell data that he and his group have developed for biologists.</p> </div> </div> </div> <div class="hero-section__author"> <div class="block block-layout-builder block-extra-field-blocknodelong-storyextra-field-author-custom"> By Corie Lok </div> </div> <div class="hero-section__date"> <div class="block block-layout-builder block-field-blocknodelong-storycreated"> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2024-10-28T10:55:51-04:00" title="Monday, October 28, 2024 - 10:55" class="datetime">October 28, 2024</time> </span> </div> </div> </div> <div class="hero-section__right col-6"> <div class="hero-section__image"> <div class="block block-layout-builder block-field-blocknodelong-storyfield-image"> <div class="field field--name-field-image field--type-entity-reference field--label-hidden field__item"> <article class="media media--type-image media--view-mode-multiple-content-types-header"> <div class="field field--name-field-media-image field--type-image field--label-hidden field__item"> <picture> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1921px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop_xl/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=KMCK4-q2 1x" media="all and (min-width: 1601px) and (max-width: 1920px)" type="image/png" width="754" height="503"> <source srcset="/files/styles/multiple_ct_header_desktop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=GwmSutWH 1x" media="all and (min-width: 1340px) and (max-width: 1600px)" type="image/png" width="736" height="520"> <source srcset="/files/styles/multiple_ct_header_laptop/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=1wmbgGo3 1x" media="all and (min-width: 800px) and (max-width: 1339px)" type="image/png" width="641" height="451"> <source srcset="/files/styles/multiple_ct_header_tablet/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=6SC_ihEQ 1x" media="all and (min-width: 540px) and (max-width: 799px)" type="image/png" width="706" height="417"> <source srcset="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf 1x" media="all and (max-width: 539px)" type="image/png" width="499" height="294"> <img loading="eager" width="499" height="294" src="/files/styles/multiple_ct_header_phone/public/longstory/CAS-NewStory_v01%20%281%29.png?itok=50nHF2cf" alt="Illustration depicting a search engine focused on cells and displaying search results" title="Illustration depicting a search engine focused on cells and displaying search results" typeof="foaf:Image"> </picture> </div> <div class="media-caption"> <div class="media-caption__credit"> Credit: Ricardo Job-Reese, Ó³»­´«Ã½ Communications </div> <div class="media-caption__description"> </div> </div> </article> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block block-better-social-sharing-buttons block-social-sharing-buttons-block"> <div style="display: none"><link rel="preload" href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg" as="image" type="image/svg+xml" crossorigin="anonymous"></div> <div class="social-sharing-buttons"> <a href="https://www.facebook.com/sharer/sharer.php?u=/taxonomy/term/956/feed&amp;title=" target="_blank" title="Share to Facebook" aria-label="Share to Facebook" class="social-sharing-buttons-button share-facebook" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#facebook" /> </svg> </a> <a href="https://twitter.com/intent/tweet?text=+/taxonomy/term/956/feed" target="_blank" title="Share to X" aria-label="Share to X" class="social-sharing-buttons-button share-x" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#x" /> </svg> </a> <a href="mailto:?subject=&amp;body=/taxonomy/term/956/feed" title="Share to Email" aria-label="Share to Email" class="social-sharing-buttons-button share-email" target="_blank" rel="noopener"> <svg aria-hidden="true" width="32px" height="32px" style="border-radius:100%;"> <use href="/modules/contrib/better_social_sharing_buttons/assets/dist/sprites/social-icons--no-color.svg#email" /> </svg> </a> </div> </div> <div class="block block-layout-builder block-field-blocknodelong-storyfield-content-paragraphs"> <div class="field field--name-field-content-paragraphs field--type-entity-reference-revisions field--label-hidden field__items"> <div class="field__item"> <div class="paragraph paragraph--type--text-narrow paragraph--view-mode--default"> <div class="clearfix text-formatted field field--name-field-text field--type-text-long field--label-hidden field__item"><p>One of the first steps for researchers in studying and analyzing single cells is to determine the cells’ identity: what type and subtype of cells are these, and how similar or different are they to previously analyzed cells? Scientists then annotate the cells with this information, a process that can take days or even weeks, depending on the number of cells being labeled, and requires labor-intensive literature and database searches.&nbsp;</p> <p>To speed up the annotation step, the Ó³»­´«Ã½â€™s Data Sciences Platform (DSP) has developed a new search engine that automates much of this process by using machine learning to search data on more than 50 million annotated single cells. The tool, <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">Cell Annotation Service</a> (CAS), promises to reduce cell annotation time from many hours to just one, and was recently <a href="https://cellarium.ai/cell-annotation-service-cas-access/">released in beta mode for scientists to use</a>.</p> <p>To learn more about CAS, we spoke with <a href="/bios/mehrtash-babadi">Mehrtash Babadi</a>, an institute scientist and director of computational methods in DSP. Babadi leads the group that built the new tool.</p> <p> <strong>How does CAS work?</strong></p> <p>CAS uses some of the same techniques behind reverse image search, which uses a search engine to find other images similar to the image you want to identify. We wanted to build a tool like that for cell biology. So we took lots of reference single-cell RNA sequencing data from atlases and used our scalable machine learning algorithms to embed all of the gene expression data on these cells into compact vector representations — you can think of these as a signature for each cell.&nbsp;</p> <p>When you have a new cell you’re interested in studying, you can use CAS to compare and match your new cell with all these reference cells based on their signatures, and nominate cells that are similar to yours. It’s basically a search engine. You give it a cell, and it shows you similar cells. And when you give it a single-cell dataset, it generates annotations and labels for you by doing this search and carrying over the labels from similar cells to your cells.</p> <p> <strong>How did you build the search engine?</strong></p> <p>Several components of CAS were initially funded by the NIH through the Center for Human Brain Variation at the Ó³»­´«Ã½, where I serve as a co-investigator. We developed the <a href="https://cellarium.ai/tool/cellarium-ai-platform/">Cellarium AI platform</a>, which powers CAS, to support researchers at the center analyzing massive datasets generated from studying hundreds of human brains, spanning multiple brain regions and tens of thousands of cells per region. Around 2022, we were in discussions with 10x Genomics about potential collaborative research projects. During these conversations, we realized that the platform could be applied beyond its initial scope. CAS emerged as one of these applications, with additional funding provided by 10x Genomics.</p> <p>As the first step, we built a software platform that could store vast amounts of single-cell data, query out these data, and then use that data to train large machine learning models and generate these embeddings, or signatures, from lots of single-cell data. We trained our models on close to 87 million cells from nearly 1,400 published studies — all of the cells in the <a href="https://cellxgene.cziscience.com/">CZ CELLxGENE</a> repository, which has been built and curated by the Chan Zuckerberg Initiative. CZ CELLxGENE made sure these datasets were harmonized at the level of the metadata attached to the cells, which made the datasets really useful for machine learning.</p> <p> <strong>Can you give some examples of how biologists can use CAS and what they can learn from it?</strong></p> <p>One application is determining cell type. Let’s say you have a cell and you know its gene expression profile. You want to know: what is the crude type of the cell? Is it a T cell? If it's a T-cell, is it a CD8+ T cell? If it is, is it like a naive, thymus-derived CD8+ T cell? Just by entering the gene expression profile of your new cell, you can narrow down the possibilities of what cell type you're dealing with.</p> <p>Another application is to identify whether the cell state you’re seeing is something typically encountered in tissues from healthy donors or in tissues from people with a particular disease. You can also ask: is this cell specific to the tissue you are studying or is it common to multiple tissues?&nbsp;</p> <p>Let’s say you have a therapeutic that is targeting a specific cell state identified in the context of a certain disease. You may want to know whether the same disease mechanism that is driven by these cells is present in other diseases. If the answer is yes, then you have a good hypothesis to extend the indication for that therapeutic to now include the new diseases.&nbsp;</p> <p> <strong>Is CAS now available to use?</strong></p> <p>Yes. The CAS model and framework we developed in collaboration with 10x Genomics is now offered to users in 10x Genomics’ Cloud Analysis Automated Cell Annotation pipeline. 10x Genomics is a provider of instruments and assays for single-cell analysis and the first interaction many users have with their single-cell data is through 10x software. We thought it would be interesting if that initial interaction could be more informative, so that not only would you see technical information about your experiment, such as the number of sequenced cells and their quality, but you’d also be able to learn more about those cells, like what cell types they are and all of the things we’ve talked about here.</p> <p>To make CAS accessible to a broader audience, including those looking to integrate the service into their own interactive or batch analysis workflows, we’re launching our implementation of CAS as a public beta service. Users can sign up by navigating to the <a href="https://cellarium.ai/tool/cellarium-cell-annotation-service-cas/">CAS landing page</a>, scrolling to the bottom of the page, and filling out the <a href="https://cellarium.ai/cell-annotation-service-cas-access/">sign-up form</a>.&nbsp;</p> <p>During the beta phase, CAS is offered at no cost, with a usage limit of 100,000 individually annotated cells per week and 200,000 individually annotated cells in total. This quota lets us provide the service to a larger and more diverse user base. Currently, the embedding model powering CAS is the same as the cell annotation pipeline offered by 10x Genomics, though future models and features may evolve separately in alignment with the development roadmaps of each organization.</p> <p> <strong>Overall, how can AI help advance cell biology?&nbsp;</strong></p> <p>One way is to make information more accessible and more integrated. We’re hoping CAS is taking the first step in that direction, by just making information more findable.&nbsp;</p> <p>The second way is to integrate all of the cell biology knowledge we have accumulated and keep accumulating into a cohesive fabric. Nowadays the paradigm is to build very large foundation models that have an integrated understanding of all the data that we've generated. This would allow us to make good predictions, by fine-tuning these models on cellular perturbation experiments, and this could potentially help unravel mechanisms underlying cell function that have remained hidden so far. This second problem is a different type of problem and it’s much harder. It's not just about being able to make old data more easily findable, but it's about being able to synthesize new data based on old data. That is our main vision for the future of all the work we're doing.</p> </div> </div> </div> </div> </div> </div> </div> <div class="content-section container"> <div class="content-section__main"> <div class="block-node-broad-tags block block-layout-builder block-field-blocknodelong-storyfield-broad-tags"> <div class="block-node-broad-tags__row"> <div class="block-node-broad-tags__title">Tags:</div> <div class="field field--name-field-broad-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/broad-tags/data-science" hreflang="en">Data Sciences Platform</a></div> <div class="field__item"><a href="/broad-tags/machine-learning-0" hreflang="en">Machine Learning</a></div> <div class="field__item"><a href="/broad-tags/single-cell" hreflang="en">Single Cell</a></div> </div> </div> </div> </div> </div> Mon, 10 Aug 2020 15:22:26 +0000 Corie Lok 637561 at