I think the "lethal trifecta" framing is useful and glad that attempts are being made at this! But there are two big, hard-to-solve problems here:
1. The "lethal trifecta" is also the "productive trifecta" - people want to be able to use LLMs to operate in this space since that's where much of the value is; using private / proprietary data to interact with (do I/O with) the real world.
2. I worry that there will soon be (if not already) a fourth leg to the stool - latent malicious training within the LLMs themselves. I know the AI labs are working on this, but trying to ferret out Manchurian Candidates embedded within LLMs may very well be the greatest security challenge of the next few decades.
1. How are you defending against the case of one MCP poisoning your firewall LLM into incorrectly classifying other MCP tools?
2. How would you make sure the LLM shows the warning, as they are non-deterministic?
3. How clear do you expect MCP specs in order for your classification step to be trustworthy? To the best of my knowledge there is no spec that outlines how to "label" a tool for the 3 axes, so you've got another non-deterministic step here. Is "writing to disk" an external comm? It is if that directory is exposed to the web. How would you know?
How do you determine if the tools access private data? Is it based solely on their tool description (which can be faked) or by trying them in a sandboxed environment or by analyzing the code?
Show HN: An MCP Gateway to block the lethal trifecta
(github.com)42 points by 76SlashDolphin 11 hours ago | 22 comments
Comments
1. The "lethal trifecta" is also the "productive trifecta" - people want to be able to use LLMs to operate in this space since that's where much of the value is; using private / proprietary data to interact with (do I/O with) the real world.
2. I worry that there will soon be (if not already) a fourth leg to the stool - latent malicious training within the LLMs themselves. I know the AI labs are working on this, but trying to ferret out Manchurian Candidates embedded within LLMs may very well be the greatest security challenge of the next few decades.
1. How are you defending against the case of one MCP poisoning your firewall LLM into incorrectly classifying other MCP tools?
2. How would you make sure the LLM shows the warning, as they are non-deterministic?
3. How clear do you expect MCP specs in order for your classification step to be trustworthy? To the best of my knowledge there is no spec that outlines how to "label" a tool for the 3 axes, so you've got another non-deterministic step here. Is "writing to disk" an external comm? It is if that directory is exposed to the web. How would you know?
Sounds like it defeats the point.