Skimming the actual paper ... it seems pretty bad?
The thing about Beethoven's 9th and biological materials which is mentioned in the OP is just that, out of a very large knowledge graph, they found small subgraph isomorphic to a subgraph created from a text about the symphony. But they seem not to cover the fact that a sufficiently large graph with some high-level statistical properties would have small subgraphs isomorphic to a 'query' graph. Is this one good or meaningful in some way, or is it just an inevitable outcome of having produced such a large knowledge graph at the start? The reader can't really tell, because figure 8 which presents the two graphs has such a poor resolution that one cannot read any of the labels. We're just expected to see "oh the nodes and their degrees match so it has the right shape", but that doesn't really tell us that their system had any insight through this isomorphism-based mining process.
For the stuff about linking art (e.g. a Kandinsky painting) with material design ... they used an LLM to generate a description of a material for DALL-E where the prompt includes information about the painting, and then they show the resulting image and the painting. But there's no measure of what a "good" material description is, and there certainly is no evaluation of the contribution of the graph-based "reasoning". In particular an obvious comparison would be to "Describe this painting." -> "Construct a prompt for DALL-E to portray a material whose structure has properties informed by this description of a painting ..." -> render.
It really seems like the author threw a bunch of stuff against the wall and didn't even look particularly closely to see if it stuck.
Also, the only equation in the paper is the author giving the definition of cosine similarity, before 2 paragraphs justifying its use in constructing their graph. Like, who is the intended audience?
Great writeup, thanks! That Kadinsky quote is what set off alarm bells for me, as it seems like a quintessential failure case for laypeople understanding LLMs -- they take some basic, vague insights produced by a chatbot as profound discoveries. It seems the reviewers may have agreed, to some extent; note that it was received by Machine Learning 24-03-26, but only accepted (after revisions) on 24-08-21.
I wrote more below with a quote, but re: "who's the intended audience?" I think the answer is the same kind of people Gary Marcus writes for: other academic leaders, private investors, and general technologists. Definitely not engineers looking to apply their work immediately, nor the vast majority of scientists that are doing the long, boring legwork of establishing facts.
In that context, I would defend the paper as evocative and creative, even though your criticisms all ring true. Like, take a look at their (his?) HuggingFace repo: https://huggingface.co/lamm-mit It seems clear that they're doing serious work with real LLMs, even if it's scattershot.
Honestly, if I was a prestigious department head with millions at my disposal in an engineering field, I'm not sure I would act any differently!
ETA: Plus, I'll defend him purely on the basis of having a gorgeous, well-documented Git repo for the project: https://github.com/lamm-mit/GraphReasoning?tab=readme-ov-fil... Does this constitute scientific value on its own? Not really. Does it immediately bias me in his favor? Absolutely!
Thank you for taking the time to read and write this up, something was "off" in the quotes describing the materials that had me at 4 of 5 alarm bells ringing. Now I can super skim confidently and giggle.
- real output here is text, using a finetuned Mixtral provided leading Qs
- the initial "graph" with the silly beethoven-inspired material is probably hand constructed, they don't describe its creation process at all
- later, they're constructing graphs with GPT-3.5 (!?) (they say rate limits, but somethings weird with the whole thing, they're talking about GPT-4 vision preview etc., which was roughly a year before the paper was released)
- Whole thing reads like someone had a long leash to spend a year or two exploring basic consumer LLMs, finetune one LLM, and sorta just published whatever they got 6 months to a year later.
The paper is a tremendous effort of passion and love for the art of science and the science of deriving discovery from art. I assure you, this person is someone to pay attention to and I hope they never give up on loving the work they do.
I’ve actually been thinking a lot about how LLM need to bridge the gap to symbolic reasoning and was very much waiting for something like this in theory…but this ain’t it.
> One comparison revealed detailed structural parallels between biological materials and Beethoven’s 9th Symphony, highlighting shared patterns of complexity through isomorphic mapping.
> The resulting material integrates an innovative set of concepts that include a balance of chaos and order, adjustable porosity, mechanical strength, and complex patterned chemical functionalization. We uncover other isomorphisms across science, technology and art, revealing a nuanced ontology of immanence that reveal a context-dependent heterarchical interplay of constituents.
I encounter this take more and more, where jargony sciencey language is dismissed as "generated". We forget that actual people do write like this, and self-satisfied researchers especially so.
More likely, this author read a bit too much Deleuze and is echoing that language to make the discovery feel more important than incidental.
If you write in a manner that gets you dismissed as a chatbot, then you've still failed to communicate, even if you physically typed the characters in the keyboard. The essence of communication isn't how nice the handwriting is, its how usefully you've conveyed the information.
Paste it into any AI detector (e.g., https://quillbot.com/ai-content-detector). They're not perfect, but they're pretty good in the aggregate. This text is almost certainly generated by an LLM.
Note, sentences highlighted in yellow means one or more models disagree.
The sentence that makes me think this might not be AI generated is
"Researchers can use this framework to answer complex questions, find gaps in current knowledge, suggest new designs for materials, and predict how materials might behave, and link concepts that had never been connected before."
The use of "and" before "predict how materials" was obviously unnecessary and got caught by both gpt-4o and claude 3.5 sonnet and when I questioned Llama 3.5 about it, it also agreed.
For AI generated, it seems like there are too many imperfections, which makes me believe it might well be written by a human.
I'm not sure this is a useful test. You can most certainly get an LLM to infinitely "correct" or "improve" its own output. But take the "The work uses graphs..." paragraph and plop it into an AI text detector like Quillbot. It's a long and non-generic snippet of text, and it will score 100% AI. This is not something that happens with human writing. Sometimes, you get false positives on short and generic text, sometimes you get ambiguous results... but in this case, the press release is AI.
I have no doubt the author of the press release used LLM to help them, but I'm not convinced that this was fully generated by AI. Since you got me thinking about this more, I decided to run the sentence across my tool with a new prompt that will ask the LLM to decide. Both Claude and Llama believe there is a 55% or more chance while GPT-4o and GPT-4o-mini feel it is less than 55%.
I created another prompt that tries to better analyze things and they (models) all agree that it is most likely AI (+60%). The highest was gpt-4o-mini at 83%.
It's definitely a run on academic writing that didn't get enough editing. It's consistently bad in ways LLMs typically correct for.
Run your papers through AI and have them identify simple corrections. It's like having an endlessly patient English Literature major at your beck and call.
Oh I'm glad that I'm not the only one who has gotten lost in the sauce by asking LLMs to recursively synthesize from data towards some grand insights--we want to see results when there is none apparent. What you end up getting is some bizarre theories overfit on the data with zero causal relationships. LLMs are fundamentally pattern matching systems and they will find "connections" between any two domains if prompted. It just reeks of confirmation bias; researchers looking for connections between art and science will find them.
The simpler explanation makes more sense: knowledge graphs naturally show certain structural properties, and these properties appear across domains due to basic mathematical constraints, common organizational principles, and human cognitive patterns reflected in data. Sure, LLMs trained on human knowledge can identify these patterns, generate plausible narratives, and create appealing connections - but this doesn't necessarily indicate novel scientific insights, predictive power, or practical utility.
If you find yourself going down a rabbit hole like this (and trust me, we've all been there), my advice is to ask "is there a simpler explanation that I'm missing?" Then start from square one: specific testable hypotheses, rigorous controls, clear success metrics, practical demonstrations, and independent validation. And maybe add a "complexity budget" - if your explanation requires three layers of recursive AI analysis to make sense, you're probably way too deep in the sauce.
Since all humans alive today have undergone the sum total of all human evolution, and are the ultimate creation of millions of years of evolution, it makes sense that the kinds of things we find "artistically pleasing" (both visually and thru sound) could have many patterns that apply to reality in deeper ways than any of us know, and so letting AI use art as it's inspiration for using those patterns in it's search for knew knowledge seems like a good idea.
Also there are also certain aspects of physical geometric relationships and even sound relationships that would not be able to be conveyed to an AI by any other means than thru art and music. So definitely using art to inspire science is a good approach.
Even the great Physicists throughout history have often appreciated how there is indeed beauty in the mathematical symmetries and relationships exhibited in the mathematics of nature, and so there is definitely a connection even if not quite tangible nor describable by man.
The author removed bridge construction from the civil engineering curriculum at MIT when he was heading the department (competitive steel bridge building is a big thing between CE departments in the US).
He said they were producing too many engineers and not enough scholars. When alumni offered to endow the program (in case it was a funding issue) he refused our donations.
Which makes this “scholarship” of chaining together some GPT prompts especially insulting.
Since the article mentions graphs, I’d like to ask what would be the advantages of graph databases over relational? Graph databases have become popular in RAG related topics, maybe mainly GraphRag related work by MS. So I wonder if the same accuracy with RAG could be achieved by traditional databases. Or if graph databases are an absolute must, then what are their limitations? Are there any successful production usage cases of graph databases?
Is it just me or does this read like complete word soup?
> The application could lead to the development of innovative sustainable building materials, biodegradable alternatives to plastics, wearable technology, and even biomedical devices.
That a transform from materials to a 19th century Russian painter somehow is applicable to what just so happens to be the zeitgeist of materials science beggars belief.
We should probably flag this article out of existence as it's pure garbage. Quite strange its getting enough upvotes to stay on the front page, but literally zero positive comments. The OP has an interesting history of posting lots of low quality articles.
Markus J. Buehler is the McAfee Professor of Engineering and former Head of the MIT Department of Civil and Environmental Engineering at the Massachusetts Institute of Technology. He directs the Laboratory for Atomistic and Molecular Mechanics (LAMM), leads the MIT-Germany program, and is Principal Investigator on numerous national and international research program... [he] is a founder of the emerging research area of materiomics. He has appeared on numerous TV and radio outlets to explain the impact of his research to broad audiences.
I think this guy's just playing political/journalistic games with his research, and tailoring it for impact rather than rigor. I'm not sure I endorse it necessarily, but I don't think we should write this off as "dumb article from MIT", but rather "the explorations of a media-savvy department head". That doesn't excuse the occasional overselling of results of course, as that's dangerous to science no matter the motivation.
Skimming the actual paper ... it seems pretty bad?
The thing about Beethoven's 9th and biological materials which is mentioned in the OP is just that, out of a very large knowledge graph, they found small subgraph isomorphic to a subgraph created from a text about the symphony. But they seem not to cover the fact that a sufficiently large graph with some high-level statistical properties would have small subgraphs isomorphic to a 'query' graph. Is this one good or meaningful in some way, or is it just an inevitable outcome of having produced such a large knowledge graph at the start? The reader can't really tell, because figure 8 which presents the two graphs has such a poor resolution that one cannot read any of the labels. We're just expected to see "oh the nodes and their degrees match so it has the right shape", but that doesn't really tell us that their system had any insight through this isomorphism-based mining process.
For the stuff about linking art (e.g. a Kandinsky painting) with material design ... they used an LLM to generate a description of a material for DALL-E where the prompt includes information about the painting, and then they show the resulting image and the painting. But there's no measure of what a "good" material description is, and there certainly is no evaluation of the contribution of the graph-based "reasoning". In particular an obvious comparison would be to "Describe this painting." -> "Construct a prompt for DALL-E to portray a material whose structure has properties informed by this description of a painting ..." -> render.
It really seems like the author threw a bunch of stuff against the wall and didn't even look particularly closely to see if it stuck.
Also, the only equation in the paper is the author giving the definition of cosine similarity, before 2 paragraphs justifying its use in constructing their graph. Like, who is the intended audience?
https://iopscience.iop.org/article/10.1088/2632-2153/ad7228#...
Great writeup, thanks! That Kadinsky quote is what set off alarm bells for me, as it seems like a quintessential failure case for laypeople understanding LLMs -- they take some basic, vague insights produced by a chatbot as profound discoveries. It seems the reviewers may have agreed, to some extent; note that it was received by Machine Learning 24-03-26, but only accepted (after revisions) on 24-08-21.
I wrote more below with a quote, but re: "who's the intended audience?" I think the answer is the same kind of people Gary Marcus writes for: other academic leaders, private investors, and general technologists. Definitely not engineers looking to apply their work immediately, nor the vast majority of scientists that are doing the long, boring legwork of establishing facts.
In that context, I would defend the paper as evocative and creative, even though your criticisms all ring true. Like, take a look at their (his?) HuggingFace repo: https://huggingface.co/lamm-mit It seems clear that they're doing serious work with real LLMs, even if it's scattershot.
Honestly, if I was a prestigious department head with millions at my disposal in an engineering field, I'm not sure I would act any differently!
ETA: Plus, I'll defend him purely on the basis of having a gorgeous, well-documented Git repo for the project: https://github.com/lamm-mit/GraphReasoning?tab=readme-ov-fil... Does this constitute scientific value on its own? Not really. Does it immediately bias me in his favor? Absolutely!
Thank you for taking the time to read and write this up, something was "off" in the quotes describing the materials that had me at 4 of 5 alarm bells ringing. Now I can super skim confidently and giggle.
- real output here is text, using a finetuned Mixtral provided leading Qs
- the initial "graph" with the silly beethoven-inspired material is probably hand constructed, they don't describe its creation process at all
- later, they're constructing graphs with GPT-3.5 (!?) (they say rate limits, but somethings weird with the whole thing, they're talking about GPT-4 vision preview etc., which was roughly a year before the paper was released)
- Whole thing reads like someone had a long leash to spend a year or two exploring basic consumer LLMs, finetune one LLM, and sorta just published whatever they got 6 months to a year later.
> and sorta just published whatever they got 6 months to a year later.
Publish and perish...
I thought it was "publish xor perish" but, huh, it really is 'or.
The paper is a tremendous effort of passion and love for the art of science and the science of deriving discovery from art. I assure you, this person is someone to pay attention to and I hope they never give up on loving the work they do.
Found the author.
I’ve actually been thinking a lot about how LLM need to bridge the gap to symbolic reasoning and was very much waiting for something like this in theory…but this ain’t it.
Looking forward to a more serious effort.
We're adding symbolic verification to LLM-generated SQL code at http://sql.ai
> One comparison revealed detailed structural parallels between biological materials and Beethoven’s 9th Symphony, highlighting shared patterns of complexity through isomorphic mapping.
This is not serious.
> The resulting material integrates an innovative set of concepts that include a balance of chaos and order, adjustable porosity, mechanical strength, and complex patterned chemical functionalization. We uncover other isomorphisms across science, technology and art, revealing a nuanced ontology of immanence that reveal a context-dependent heterarchical interplay of constituents.
The article itself seems generated.
I encounter this take more and more, where jargony sciencey language is dismissed as "generated". We forget that actual people do write like this, and self-satisfied researchers especially so.
More likely, this author read a bit too much Deleuze and is echoing that language to make the discovery feel more important than incidental.
If you write in a manner that gets you dismissed as a chatbot, then you've still failed to communicate, even if you physically typed the characters in the keyboard. The essence of communication isn't how nice the handwriting is, its how usefully you've conveyed the information.
Paste it into any AI detector (e.g., https://quillbot.com/ai-content-detector). They're not perfect, but they're pretty good in the aggregate. This text is almost certainly generated by an LLM.
I ran this across my AI spelling and Grammar checker at
https://app.gitsense.com/?doc=4715cf6d95689&other-models=Cla...
Note, sentences highlighted in yellow means one or more models disagree.
The sentence that makes me think this might not be AI generated is
"Researchers can use this framework to answer complex questions, find gaps in current knowledge, suggest new designs for materials, and predict how materials might behave, and link concepts that had never been connected before."
The use of "and" before "predict how materials" was obviously unnecessary and got caught by both gpt-4o and claude 3.5 sonnet and when I questioned Llama 3.5 about it, it also agreed.
For AI generated, it seems like there are too many imperfections, which makes me believe it might well be written by a human.
I'm not sure this is a useful test. You can most certainly get an LLM to infinitely "correct" or "improve" its own output. But take the "The work uses graphs..." paragraph and plop it into an AI text detector like Quillbot. It's a long and non-generic snippet of text, and it will score 100% AI. This is not something that happens with human writing. Sometimes, you get false positives on short and generic text, sometimes you get ambiguous results... but in this case, the press release is AI.
I have no doubt the author of the press release used LLM to help them, but I'm not convinced that this was fully generated by AI. Since you got me thinking about this more, I decided to run the sentence across my tool with a new prompt that will ask the LLM to decide. Both Claude and Llama believe there is a 55% or more chance while GPT-4o and GPT-4o-mini feel it is less than 55%.
https://app.gitsense.com/?doc=381752be7fd0&prompt=Is+AI+Gene...
Edit:
I created another prompt that tries to better analyze things and they (models) all agree that it is most likely AI (+60%). The highest was gpt-4o-mini at 83%.
https://app.gitsense.com/?doc=381752be7fd0537&prompt=Is+AI+G...
I messed up and didn't really analyze the paper but the article. Since I can't edit, here is the introduction to the paper:
https://app.gitsense.com/?doc=696357b733b
which contains enough grammatical errors that I'm pretty sure it was not generated by AI.
It's definitely a run on academic writing that didn't get enough editing. It's consistently bad in ways LLMs typically correct for.
Run your papers through AI and have them identify simple corrections. It's like having an endlessly patient English Literature major at your beck and call.
Oh I'm glad that I'm not the only one who has gotten lost in the sauce by asking LLMs to recursively synthesize from data towards some grand insights--we want to see results when there is none apparent. What you end up getting is some bizarre theories overfit on the data with zero causal relationships. LLMs are fundamentally pattern matching systems and they will find "connections" between any two domains if prompted. It just reeks of confirmation bias; researchers looking for connections between art and science will find them.
The simpler explanation makes more sense: knowledge graphs naturally show certain structural properties, and these properties appear across domains due to basic mathematical constraints, common organizational principles, and human cognitive patterns reflected in data. Sure, LLMs trained on human knowledge can identify these patterns, generate plausible narratives, and create appealing connections - but this doesn't necessarily indicate novel scientific insights, predictive power, or practical utility.
If you find yourself going down a rabbit hole like this (and trust me, we've all been there), my advice is to ask "is there a simpler explanation that I'm missing?" Then start from square one: specific testable hypotheses, rigorous controls, clear success metrics, practical demonstrations, and independent validation. And maybe add a "complexity budget" - if your explanation requires three layers of recursive AI analysis to make sense, you're probably way too deep in the sauce.
I think this article marks the "peak of inflated expectations" of AI for HN posts.
Since all humans alive today have undergone the sum total of all human evolution, and are the ultimate creation of millions of years of evolution, it makes sense that the kinds of things we find "artistically pleasing" (both visually and thru sound) could have many patterns that apply to reality in deeper ways than any of us know, and so letting AI use art as it's inspiration for using those patterns in it's search for knew knowledge seems like a good idea.
Also there are also certain aspects of physical geometric relationships and even sound relationships that would not be able to be conveyed to an AI by any other means than thru art and music. So definitely using art to inspire science is a good approach.
Even the great Physicists throughout history have often appreciated how there is indeed beauty in the mathematical symmetries and relationships exhibited in the mathematics of nature, and so there is definitely a connection even if not quite tangible nor describable by man.
The author removed bridge construction from the civil engineering curriculum at MIT when he was heading the department (competitive steel bridge building is a big thing between CE departments in the US).
He said they were producing too many engineers and not enough scholars. When alumni offered to endow the program (in case it was a funding issue) he refused our donations.
Which makes this “scholarship” of chaining together some GPT prompts especially insulting.
Since the article mentions graphs, I’d like to ask what would be the advantages of graph databases over relational? Graph databases have become popular in RAG related topics, maybe mainly GraphRag related work by MS. So I wonder if the same accuracy with RAG could be achieved by traditional databases. Or if graph databases are an absolute must, then what are their limitations? Are there any successful production usage cases of graph databases?
Is it just me or does this read like complete word soup?
> The application could lead to the development of innovative sustainable building materials, biodegradable alternatives to plastics, wearable technology, and even biomedical devices.
That a transform from materials to a 19th century Russian painter somehow is applicable to what just so happens to be the zeitgeist of materials science beggars belief.
One to save for April 1st
We should probably flag this article out of existence as it's pure garbage. Quite strange its getting enough upvotes to stay on the front page, but literally zero positive comments. The OP has an interesting history of posting lots of low quality articles.
Maybe SEO/Amazon-esque shadow work is keeping it on the front page.
Can we call this "Deep Trolling"?
The very fact that some people are trying to take this seriously is probably the point he’s trying to make.
Wow, what's happened to MIT?
Well...
I think this guy's just playing political/journalistic games with his research, and tailoring it for impact rather than rigor. I'm not sure I endorse it necessarily, but I don't think we should write this off as "dumb article from MIT", but rather "the explorations of a media-savvy department head". That doesn't excuse the occasional overselling of results of course, as that's dangerous to science no matter the motivation.did it actually make a novel material that is plausibly useful?
"The Future of Innovation" sounds exactly like freshly squeezed GPT drivel I'd expect to read from a vapid "hustler" on LinkedIn.
[dead]
...k