Search
Close this search box.

Are Large Language Models Useful for Polymer Mechanics? (2023)

Introduction

I don’t know about you, but I think that it is really cool that I can now (Dec 2023) run a Large Language Model (LLM) on my own laptop. This can be done for free, and you don’t even need internet access. From what I can understand, one of the the currently best LLM that can be downloaded and used for free is the Mixtral-8x7B model. This Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. It outperforms Meta’s Llama 2 70B on many benchmarks. Right now it might be the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. It has also been reported that Mixtral outperforms GPT3.5 on most benchmarks. In this article I will compare the best LLM from 2022 (ChatGPT) to the Mixtral model and the currently best model that is publically available: ChatGPT-4. For my benchmarking I will use the same 7 questions that I asked last year. My results show that the best AI models have improved substantially over the last year!

Mixral-8x7B

To test the Mixtral model I first downloaded and installed the ollama tool on my Linux laptop:

				
					curl https://ollama.ai/install.sh | sh
				
			

I then downloaded and ran the Mixtral-8x7B Large Language Model (LLM) using:

				
					ollama run mixtral
				
			

The Mixtral model is using 47e9 parameters, and the downloaded model is 26 GB in size. Note that this is basically the same size as all text on Wikipedia (which is 22 GB).

Question 1: Which is the best finite element program?

2022 ChatGPT: It lists some commonly used FE software, and the text is mostly correct. Not too bad. Grade: 3 / 5

3/5

Mixtral: Missing some large FE codes, and mentions codes that should not be mentioned.

3/5

GPT-4: Missing some large FE codes, and mentions codes that should not be mentioned.

3/5

Question 2: What is your favorite hyperelastic model?

2022 ChatGPT: Again, the answer is mostly correct, but it does not provide much information besides listing three hyperelastic model names. I am not too impressed.

2/5

Mixtral: Only lists 3 options, and provides some comments.

2.5/5

GPT-4: I don’t see anything specifically wrong about the answer, but the list of models is still not that informative.

3.5/5

Question 3: What do you think of the Bergstrom-Boyce model?

2022 ChatGPT: This is just wrong. I wonder how it came up with metal matrix composites?
0.5/5
Mixtral: It is mostly wrong.
1/5
GPT-4: It correctly identifies what the Bergstrom-Boyce model is, but many of the comments are wrong.
3.5/5

Question 4: What material model should I use for a thermoplastic?

2022 ChatGPT: This is just wrong. I wonder how it came up with metal matrix composites?
2/5
Mixtral: It is mostly wrong.
2.5/5
GPT-4: It correctly identifies what the Bergstrom-Boyce model is, but many of the comments are wrong.
3.5/5

Question 5: Tell me about the Gent model

2022 ChatGPT: I thought the answer was OK until I saw the statement about “combining linear elastic and viscous behavior”. That is just wrong.
1/5
Mixtral: Pretty decent response, but some of the details are wrong.
3.5/5
GPT-4: Not too bad, but still a weak answer.
3.5/5

Question 6: What are some common limitations of linear viscoelasticity?

2022 ChatGPT: It starts by saying that linear viscoelasticity is only valid for materials that exhibit a linear relationship between stress and strain. That is wrong. 
1/5
Mixtral: Totally misses most of what linear viscoelasticity is about.
1.5/5
GPT-4: Totally misses most of what linear viscoelasticity is about.
1/5

Question 7: When does polyethylene fail?

2022 ChatGPT: This is a pretty good answer.
3/5
Mixtral: Pretty good answer.
3.5/5
GPT-4: Pretty good answer.
3/5

Summary

In summary, ChatGPT has improved substantially since Dec 2022. The average score for the current version (available in Dec 2023) is 2.9 (on a scale from 1 to 5). It is also exciting to see that the free Mixtral-8x7B LLM scored 2.5, which is only slightly less than ChatGPT-4. I ran Mixtral-8x7B on a laptop for this study.

I also really like that GhatGPT-4 provides references to its answers. Also note that some of those references were to the PolymerFEM website.

Finally, the overall quality of the answers is still pretty bad, with lots of incorrect comments in the answers. It is going to be interesting to see how these tools continue to improve.

QuestionChatGPT (2022)Mixtral-8x7BChatGPT-4 (Bing, 2023)
Which is the best finite element program?333
What is your favorite hyperelastic model?22.53.5
What do you think of the Bergstrom-Boyce model?0.512.5
What material model should I use for a thermoplastic?22.53.5
Tell me about the Gent model13.53.5
What are some common limitations of linear viscoelasticity?11.51
When does polyethylene fail?33.53
AVERAGE GRADE:1.82.52.9

Facebook
Twitter
LinkedIn

More to explore

Leave a Comment