Resources
Welcome to your curated resource list for the hackathon. This guide provides a starting point for learning key techniques, finding datasets, and exploring the essential tools and research papers at the intersection of Large Language Models (LLMs), materials science, and chemistry.
Tutorials & Learning Resources
Foundational LLM Concepts
Intro to Large Language Models
A general-audience overview of how LLMs are trained and the key concepts behind their operation, including pre-training, fine-tuning, and RLHF.
Library-Specific Tutorials
RDKit
RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.
Official Documentation: The "Getting Started with the RDKit in Python" guide is the best place to begin.
YouTube Tutorial: Video tutorial by Jan Jensen is another great resource to start with.
PySCF
The Python-based Simulations of Chemistry Framework (PySCF) is an open-source library for quantum chemistry calculations. It is highly extensible and designed for simplicity, both for users and developers.
Official Documentation: The 'User Guide' and 'Tutorials' are great starting points.
Atomic Simulation Environment (ASE)
The Atomic Simulation Environment (ASE) is a set of tools and Python modules for setting up, manipulating, running, visualizing and analyzing atomistic simulations.
Official Documentation: The 'ASE Tutorials' is the best place to begin with.
Pymatgen
Pymatgen (Python Materials Genomics) is a robust, open-source Python library for materials analysis.
Official Documentation: The pymatgen API Documentation is the best place to begin with.
YouTube Tutorial: Video tutorial by Anubhav Jain, developer of pymatgen, is another great resource to start with.
LangChain
LangChain is an open-source library specifically designed for creating applications using large language models (LLMs).
Official Documentation: The best place to begin is the official LangChain documentation. It offers a comprehensive overview of the framework, from installation to advanced use cases.
YouTube Tutorial: For visual learners, aiwithbrandon YouTube channel provides a wealth of tutorials. The "LangChain Master Class For Beginners 2024" video is an excellent starting point.
LangGraph
LangGraph, created by LangChain, is an open source AI agent framework designed to build, deploy and manage complex generative AI agent workflows.
Official Documentation: To dive into building stateful, multi-actor applications, the LangGraph documentation is your go-to resource.
YouTube Tutorial: A great video tutorial by LangChain on "Building Effective Agents with LangGraph" provides a practical introduction to creating sophisticated agents.
Technique-Specific Guides
Fine-Tuning LLMs
Fine-tuning allows you to adapt pre-trained models to specific tasks or domains, making them more effective for specialized applications in materials science and chemistry.
Materials Science & Chemistry Datasets
Materials Science Datasets Compilation
A curated list of awesome materials and chemistry datasets by Ben Blaiszik.
General Datasets
Key Research Papers & Reviews
Examples & Reviews on LLMs in Materials & Chemistry
Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry
In the 2nd global hackathon for LLMs applications for materials and chemistry 34 teams used large language models to create applications for materials science and chemistry research across seven different areas like property prediction, molecular design, and scientific communication.
14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon
In the 1st global hackathon for LLMs applications for materials and chemistry participants used large language models like GPT-4 to build working prototypes for chemistry and materials science applications in just two days.
A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools
A comprehensive review on how large AI foundation models (like ChatGPT-style systems) are being used to accelerate materials science research across six key areas from data analysis to discovering new materials.