America's current war with Iran is a good example of the dangers of using artificial intelligence in war. A newly released video suggests that a US Tomahawk missile likely struck a girl's school in Iran, killing 175 people, most of them children.

US media has published many articles making it clear that the US military is using artificial intelligence in its military operations against Iran. Reports suggest that the US military is using large language models (LLMs) like Anthropic's Claude and OpenAI's ChatGPT for a number of tasks such as mission planning, logistics, and target identification.

I don't doubt that this strike was the result of an error in the target identification systems that the US military employed.

It seems pretty obvious that even at the Pentagon, the capabilities of large language models are poorly understood. Large language models can generate language with almost 100% accuracy, and this has given rise to a well-known cognitive bias called the halo effect. This bias colors our perception of everything else about a person or product simply because they excel at one thing — leading us to assume they are good at other things too.

For example, if a company produces really good TVs and then releases headphones, the halo bias would lead us to assume the headphones are of good quality as well.

Large language models are essentially mathematical equations that are very good at predicting the next letter in a sequence, based on the surrounding corpus of text. They cannot and do not think like humans do. Humans are biological systems; large language models are digital ones.

As a result, large language models exhibit a number of attributes that are very dangerous in the context of war and demonstrate their lack of fitness for military use.

These include:

  • No instinct for self-preservation
  • No empathy
  • Algorithmic bias

It is humanity's drive for self-preservation that makes humans open to compromise and negotiation — without which the human race would surely end.

Modern warfare is supposed to be conducted according to rules of engagement that have evolved through a combination of international humanitarian law, past conflicts, and professional military codification. These rules came about because of our ability to empathize with other human beings and our desire to inflict as little suffering as possible on others.

Large language models have been trained on data drawn from essentially all the content of the internet, extracted from websites and converted into machine-readable text. Anyone who has spent time on the internet can attest that this data is full of human prejudices. Because of this, large language models reflect these prejudices in their outputs and behavior.

Beyond these core issues, there are several other concerns with using large language models in a military context.

Flash Wars or Hyper Wars

In modern warfare, strategic advantage is gained by compressing the Observe-Orient-Decide-Act (OODA) loop. When adversarial AIs are engaged in conflict against each other, this loop — and the resulting tempo of battle — will accelerate beyond human comprehension. Artificial intelligence can act at superhuman speed, making it impossible for humans to react to or even follow its actions.

Inexplicability of decisions or actions

We do not currently understand how or why LLMs behave the way they do. These are effectively black boxes, with very little insight available into their internal workings. Current AI models are built to accept an input and predict the correct output. The model learns to do this by identifying patterns in its training data and repeatedly performing statistical calculations, reducing prediction error with each iteration. How the model identifies those patterns, or determines how to reduce the errors, is not well understood.

Hallucinations

Large language models can generate factually incorrect or completely false responses when given a question or task where the training data is sparse. The model fills in the gaps with the most statistically plausible-sounding words, which may have absolutely nothing to do with reality. Sometimes the model is over-fitted to the data on a specific topic, causing it to make connections where none exist. In other cases, it may be under-fitted, causing it to guess based on little more than general context. Since large language models have no agency and no real-world experience, they lack the mental model of the world that humans rely on when making judgments.

In a military context specifically, cross-model hallucinations — which are common in AI vision models — present a significant likelihood of error. The AI may see an object in an image that is not actually there. This may have been a factor in the strike on the girls' school in Iran.

These are some of the broad and well-known reasons for not using artificial intelligence in military applications. This technology is relatively new, and the full spectrum of its dangers is yet to be discovered.