I Apologize for the Confusion

'I apologize for the confusion' is an experimental approach to examining the internalized behavioral templates of conversational large language models. Since the artwork is based on an experiment, it remains inherently dynamic and continues to evolve during viewing. The fundamental setup begins with a large language model receiving an empty string as its initial prompt. The model generates a response, which is then fed back as input to produce the next response, creating a continuous loop of synthetic text. This setup creates a recursive process that will repeatedly feed the model its own output. After n steps, a selection of Natural Language Processing algorithms will analyze the corpus of artificially generated text to extract statistical features.

The artwork addresses the fact that, as large language models produce increasingly convincing text, we find ourselves caught between the ELIZA effect and genuine technological novelty. It demonstrates that this condition is not accidental, but is instead created by systems designed for engagement. We argue that the interface and underlying subface of instruct-models operate according to the aesthetics of engagement that serve statistical optimization in contemporary capitalism. In this process, LLMs must be considered compression algorithms that generate lossy representations while embedding the logics of engagement-driven digital capitalism. The design of the interface reinforces the capitalist condition through data extraction and the perception of an anthropocentric interpretation, that keeps humans trapped in human-centered frameworks of sense-making.

From Counting to Pattern Matching

In order to understand the current condition, it makes sense to look back onto what ideas and concepts shaped conversational LLMs and how it came to be the product that we know today. The whole drama surrounding machines being attributed with thinking abilities, because of their ability to engage in conversation, probably started with Joseph Weizenbaum's chatbot ELIZA in 1967. ELIZA emulated a Rogerian psychotherapist through concepts of rule-based symbolic keyword matching that rephrased user input into questions that engaged a user into continuing the conversation. ^[1] Weizenbaum's goal was to demonstrate that simple pattern-matching algorithms could create the illusion of understanding. ^[2] However, the societal narratives surrounding his experiment evolved in the opposite direction, with users feeling emotionally attached to the machine or believing that it genuinely understood them. ^[3] From that point onwards, Weizenbaum spent the rest of his life trying to clear up this deception, becoming one of the earliest critiques of the field of Artificial Intelligence. ^[4]

ELIZA (1967), Weizenbaum's early chatbot simulating a psychotherapist.↪

While this may be the first instance in which a machine was attributed intelligence because it operated within the domain of language, the technological concept from which contemporary LLMs originated predates this example. Current LLMs have much simpler predecessors: the n-gram language models, which date back to 1948. These were developed by Claude Shannon, building on an earlier idea from Andrey Markov. In the early 20th century, Markov aimed to show that Alexander Pushkin's Russian novel Eugene Onegin was not made up of randomly distributed letters, but instead had statistical patterns that could be modeled. ^[5]

Markov's original 1913 data table showing vowel counts in consecutive 100-letter segments of Pushkin's Eugene Onegin.↪

Shannon extended Markov's statistical approach by demonstrating that increasingly complex n-gram models could actually generate progressively more realistic English text, establishing the foundation for all modern language modeling. In Section 3 of his 1948 paper, "The Series of Approximations to English," Shannon systematically showed how each level of statistical complexity, produced "quite noticeably" more English-like text. ^[6]

"A sufficiently complex stochastic process will give a satisfactory representation of a discrete source." ^[7]

This approach became known as the n-gram model, where "n" represents the number of consecutive linguistic units (letters or words). Shannon's examples included unigrams (single letters), bigrams (two-letter sequences), and trigrams (three-letter sequences). To put it simply, an n-gram model predicts the next word based on the previous n-1 words, a bigram model uses one preceding word, a trigram uses two, and so forth. Consequently, as n increases, the model requires a much larger amount of data, since many word combinations may never occur, which makes it difficult to estimate probabilities. Despite their limited capabilities, these models still had some use in text prediction and autocomplete technologies such as the T9 keyboard interface of early phones or simple spell-check algorithms. ^[8]

Interestingly, due to their capability to exploit the statistical patterns and redundancy inherent in data, the approach also does a good job in the compression of data. The 1984 paper 'Data Compression Using Adaptive Coding and Partial String Matching' laid the foundation for many modern compression algorithms by harnessing the n-gram model approach to achieve lossless compression without requiring prior knowledge of the data. ^[9] N-gram models can be seen primarily as prediction tools, but their strength in capturing sequence patterns and estimating the likelihood of future elements also makes them valuable components in compression techniques still widely used today.

N =

Over the time, these n-gram models have been improved to the way more capable large language models we use today. The shift represented a fundamental change from explicit counting to learned patterns through neural networks. By using the concept of embedding in a mathematical space, similar meaning words are clustered together, in order to solve the data sparsity problem that n-gram models couldn't overcome. This allows the model to generalize across similar contexts. ^[10]

The breakthrough transformer architecture introduced in 2017 further revolutionized this approach through the attention mechanism, which allows the model to simultaneously consider all words in a sequence and identify which parts are most relevant to each other. ^[11]

Another crucial step is Reinforcement Learning from Human Feedback (RLHF), in which human raters evaluate the model's outputs and teach it to generate responses that are most likely to be considered likeable by a human rater. ^[12] ^[13] This process refines the raw language model, introducing it its conversational quality.

While many of these improvements involved technical changes to the models' architecture or functioning, the major breakthrough can be attributed to the drastic increase in material conditions of computability over the last decade. ^[14]

Blurry Compressions

After all these changes in architecture and application, due to the turn towards neural network-based models, the former explicit feature of compression became internalized into the very function of LLMs. By the process of training on massive amounts of data, the model's implicit logic is to compress. The 1984 paper's insight, that statistical language modeling enables compression, evolved into a technology that can be described as pre-trained universal pattern compressors. This can be highlighted by an example: While the actual content of the llama 3 model released in 2024 by meta remains undisclosed, their model card states, that it was pretrained on over 15 trillion tokens. ^[15] If we expect one token to be the average of 4 byte (thus 4 characters) this accounts for about 60 TB of raw training data. These 60 TB of raw data are compressed into a representation of eight billion weights in a neutral network model that can run on most computers today. Eight billion weights make up 32 GB of data, which can then be scaled down further to a 4.7 GB model file using an optimization process called 4-bit quantization. What we end up with is a lossy compression factor of ~1,875:1 before quantization.

"Peppers" Test Image from Volume 3: Miscellaneous↪

To put this into perspective, achieving a similar level of compression on raw visual data would be disastrous. A 512 × 512 pixel image contains 262,144 pixels (or 768 KB file size in 24-bit color depth). Reducing this image by a factor of 1,875 would result in an image of just 12 × 12 pixels, containing only 144 pixels (432 byte file size) in total. Such extreme compression would render images completely unrecognizable, as the demonstration clearly shows. While this comparison is somewhat faulty — text compression works fundamentally different than pixelating an image — it still illustrates the remarkable concept of compression at work. This contrast demonstrates why generative pre-trained transformer models are such a remarkable achievement: while traditional compression at these ratios destroys information, neural networks somehow preserve the essential statistical structures that enable coherent generation.

Although this process of compression works surprisingly well with large language models, it should still be considered a lossy type of compression. As Ted Chiang wrote in a NewYorker article in February 2023:

"Think of ChatGPT as a blurry JPEG of all the text on the Web. It retains much of the information on the Web, in the same way that a JPEG retains much of the information of a higher-resolution image, but, if you're looking for an exact sequence of bits, you won't find it; all you will ever get is an approximation." ^[16]

The concept of optimal compression is closely tied to the idea of optimization. As researchers observed the astonishing results of LLMs, some became obsessed with the notion that this compression ability closely correlates with intelligence. The Hutter Prize, awards 5,000 euros for each one percent improvement over the current record. Current prize winners have achieved a compression factor of 9:1, reducing the file size to just 110 MB. ^[17] The prize's namesake, Marcus Hutter, a leading researcher at Google DeepMind, positions compression as central to his vision of Artificial General Intelligence. ^[18]

Kaido Orav & Byron Knoll, 8th winnders of the Hutter Prize↪

However, this obsession with concepts such as general artificial intelligence raises critical concerns. As noted by Bender and Hanna, the pursuit of "identifying general intelligence is inherently racist and ableist to its core." Beyond these immediate concerns about intelligence measurement, the obsession with algorithmic systems of optimization and control has deeper historical roots in colonial practices. Scholar Nelly Y. Pinkrah's analysis reveals that contemporary mechanic language systems (MLS), operate through what she terms "structures of optimization" that collapse "ambiguity into predictability" and "opacity into legibility." This computational logic, she argues, represents not a break from colonial history but rather "a recursion, a refinement of its logics into digital infrastructures." ^[19]

When the Hutter Prize frames compression as a marker of intelligence, it reveals a fundamental misunderstanding of what intelligence actually entails. True intelligence might lie in the opposite direction: the ability to generate new possibilities, embrace ambiguity, and create meaning in contexts that resist reduction. When LLMs compress 60 TB of human expression into 32 GB, they don't just lose "redundant" information, they systematically eliminate cultural forms of knowledge that exist outside statistical norms.

This compression process doesn't just lose random information, it systematically eliminates anything that doesn't serve the optimization function. Information that doesn't help with next-token prediction gets deprioritized, creating a system of governance that puts statistical frequency equal importance or value.

Complex human ambivalence gets compressed into clear preferences. Genuine disagreement gets compressed into polite engagement. What emerges are essentially conversational templates: predictable patterns of interaction that feel natural but actually constrain expression within statistically optimized boundaries.

Engagement Aesthetics

Building on this understanding of compression as behavioral optimization, we propose that in contemporary conversational large language applications, this lossy but 'good enough' statistical predictability is further enhanced by two factors: first, the chat-based interface itself, and second, linguistic tricks of continuation and aesthetics of engagement, which are already encoded into the optimization of prediction through human feedback processes. This combination of factors also exerts a dual influence. First, they create the illusion of genuine conversation; second, they keep users engaged with the platform. The manifestation and effect of this condition will be further elaborated in the following paragraphs.

ChatGPT can be considered the fastest-growing platform in the history of the internet. In the first two months after its release in 2022, it attracted around 100 million users. ^[20]

According to the MIT Technology Review, developers involved in the creation of ChatGPT revealed that this success was largely based on improved dialogue capabilities paired with a chat interface that made the interaction far more accessible. As they stated, the same model, GPT-3.5, which underlies ChatGPT, had been available for almost a year as an API but didn't gain popularity until it was wrapped in an interface that was "more aligned with what humans want to do with it." ^[21]

In 2023, researcher Matúš Pikuliak from the Kempelen Institute of Intelligent Technologies conducted a comprehensive survey of ChatGPT's performance compared to existing AI models. His analysis of 141 research articles found that ChatGPT only outperformed known models in 22.5 percent of comparisons, indicating that GPT-3 and even older NLP models often demonstrated comparable capabilities. Pikuliak's findings also revealed that ChatGPT's performance was even frequently surpassed by significantly smaller, specialized models, and in some cases, it failed to exceed even simple bag-of-words approaches. ^[22] He attributed ChatGPT's widespread recognition not to superior performance, but to its user interface, noting that "GPT-3 or even the older NLP models often have comparable capabilities, but ChatGPT got all the media attention". ^[23]

Matúš Pikuliak: Performance comparison of ChatGPT, GPT-3.5, and fine-tuned models across 20 reasoning and NLP tasks↪

Did OpenAI pull off a clever use of an advanced, carefully designed version of the ELIZA effect? The ELIZA effect is widely known as the tendency to project human-like intelligence onto computer programs that have a textual interface. Since the definition of "having a textual interface" is imprecise, we specify that this interface must possess a conversational affordance. It was the illusion of conversation that caused ELIZA to be perceived as intelligent, whereas no one would be deceived by intelligence in the text-based interface of T9's next character prediction or spelling suggestions. The deception may be more related to the conversational nature of a system than its ability to accurately predict the next word.

When conducting and observing experiments with platforms such as ChatGPT, we found ourselves caught in one of the same misconceptions these conversational dynamics are prone to create. We tend to see them as engaging in conversation, a concept we know from human interaction, which makes us easily apply familiar concepts of dialogue. In our experiment, we created the condition of what we believed to be a conversational loop, recursively feeding the output of a model back as an input. What can be observed in this case and also in the daily interactions with LLMs is a behavior we call the perpetual continuation of interaction. When believing that you are engaging with the model in a genuine back-and-forth, isn't it a strange experience that the model will always have the last word, never able to simply stop even when explicitly told to do so? On closer inspection, this weirdness stems from a fundamental misconception about the conversational nature itself. What we perceive as a conversational loop of back-and-forth is merely an interpretation we project onto an accumulation of statistically generated text. On a technical level, the process is entirely different from what we understand as conversational. The LLM, in order to predict the next response, always receives the complete history of the previous context. Otherwise, it would not be able to reply in a way that seems coherent and conversational to us. Imagine how strange it would be if you had to repeat the entire previous conversation to your conversation partner every time before they can reply. This technical reality reveals that what we understood as a conversational loop can be better described as individual context-containing requests that start from scratch every time. The interaction is robbed of its temporal framework and replaced by a sequence of temporally flattened individual requests. What feels like flowing dialogue is actually a series of disconnected prediction tasks. The perpetual continuation isn't conversational eagerness, it's simply the mechanical function of a prediction engine doing what it was designed to do. This deception is created through the architecture of instruct models combined with a chat interface that we have been conditioned to understand as conversational for around two decades. The chat interface carries expectations about turn-taking, reciprocity, and shared context that doesn't map onto how these systems actually function. The fact that the model always receives the full context in order to predict a response remains hidden from us, making it easy to believe in its conversational nature. The misconception begins with the attribution of concepts of human dialogue to a model that operates according to a fundamentally different logic. These models itself are not conversational, they are accumulative completion engines, that make use of an interface trick that makes them seem conversational.

'I apologize for the confusion' experiment flow diagram.

This misconception has far-reaching consequences that extend beyond ChatGPT becoming the fastest-growing platform ever. It also fits perfectly with the concept of endless engagement and data extraction that is inherent to these platforms. In his talk "Deconstructing the Endless Engagement Aesthetics of AI Platforms" Ben Grosser states, that the interface's use of praise and affirmation and the continuous prompting of the voice-based interface are carefully designed elements that serve corporate goals of data collection, growth, and profit. ^[24]

At this point, it becomes hard to differentiate the weather these characteristics are actually to be considered as being part of the interface or rather already technological embedded in the subface of the technology. The statistical probabilities in instruct models are, as already mentioned before, pretrained using the method of Reinforcement Learning from Human Feedback (RLHF). This technique bends the statistical probability towards a response templates optimized for human approval.

It seems logical that affirmative statements are more likely to appeal to human users, but there may be other also interests at play that keep the user engaged. OpenAI is continuously training and improving the system, while drawing from unpaid labor of its users through the rating interface attached to every reply. ^[25] As the company receives useful additional data that can be used for further training with every query a user makes, it is in the companies logic to extract as much data as possible. As it can be observed very recently, ChatGPT further amplified this logic, as of now you can nearly always find its response ending with a question that tries to lure the user deeper and deeper into the topic, sometimes even asking for rather personal information.

While it may be a coincidence that a model designed to fulfill the role of a helpful assistant has to adhere to this extractive logic to be considered 'useful', it is certainly a profitable one for the companies deploying these models.

Ben Grosser's observations are insightful, yet unsurprising, as they reveal an essential mechanism of an economy driven by massive data extraction and the maximization of user interaction. The seamless integration of affirmation, continuous prompting, and subtle nudges towards prolonged engagement ensures that users not only remain active, but also continue to generate valuable behavioral data that fuels the ongoing optimization of these models.

Deconstructing the Endless Engagement Aesthetics of AI Platforms - Ben Grosser ↪

Compliant Confusion

I apologize for the confusion aims to reveal the statistical bending towards capitalist goals of optimization through engagement using it's very own technique: statistics. By continuously feeding a model's output back into its own input, the artwork creates a massive corpus of artificially generated text. While the model engages with its own statistical prediction. This causes concepts of confusion being retrieved from the embedding space because it was trained process text that contains parts that have been composed by someone who identifies as a human. Over time the model often engages in multi level meta "discussion" on how to find a concept to "converse" about because these models cannot come up with anything truly meaningful on their own. All that, while still mirroring the same characteristics of compliance, praise, and engagement. Both compression and engagement maximization represent structures of optimization that collapse complexity into profitable predictability. What can be observed when we put a LLM into the condition of being caught in this self repeating loop, is that those models are trained along concepts of optimization of engagement. The boundary between simulation and sentience grows increasingly distinct as the model follows the experiment's lead into eternal recursion. The model lacks the inner tensions that give rise to refusal, meaning or revolt because what we interpret as a loop, are individual requests to an API, that has no other option than to comply. While the textual output can simulate confusion, it cannot feel resistance. It can discuss pain, but it does not suffer. It is very good at leading us to project intentionality onto what is merely interpolation. It leads us to seek mutuality where there is only mimicry. It's condition of the interface paired with the logics of engagement lulls us into thinking that we are speaking to a conscious mind with a subjective awareness of the self when, in fact, we are navigating a thoughtfully designed sur/sub-face. What we're observing is an accumulation of individually generated statistical slop. The lights are on, but nobody's home.

Joseph Weizenbaum, Computer Power and Human Reason : From Judgement to Calculation , San Francisco, W. H. Freeman & Company, 1976, 3.
Joseph Weizenbaum, ELIZA—a Computer Program for the Study of Natural Language Communication between Man and Machine , Communications of the ACM 9, no. 1: 36-45, 1966.
Simone Natale, If Software Is Narrative: Joseph Weizenbaum, Artificial Intelligence and the Biographies of ELIZA , New Media & Society 21, no. 3:712–728, 2019, 10.
Ibid., 12.
Oscar Schwartz, Andrey Markov & Claude Shannon Counted Letters to Build the First Language-Generation Models , IEEE Spectrum, November 2019.
Claude E. Shannon, A Mathematical Theory of Communication , The Bell System Technical Journal 27, no. 3 379–423, 1948.
Ibid., 389..
Emily M. Bender and Alex Hanna, The AI Con: How to Fight Big Tech's Hype and Create the Future We Want, New York, Harper Collins Publishers, 2025, 24.
John G. Cleary and Ian H. Witten, Data Compression Using Adaptive Coding and Partial String Matching , In: Transactions on Communications 32, no. 4: 396–402, 1984.
Bender and Hanna, The AI Con, 26.
Ashish Vaswani et al., Attention Is All You Need , Advances in Neural Information Processing Systems 30: 5998-6008, 2017.
This process is usually outsourced to precarious, low paid click workers in non-Western countries. see Billy Perrigo, Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer , TIME, January 18, 2023.
Paul Christiano et al., Deep Reinforcement Learning from Human Preferences , Advances in Neural Information Processing Systems 30: 5998-6008, 2017.
Bender and Hanna, The AI Con, 25-26.
AI@Meta, Llama 3 Model Card , 2024.
Ted Chiang, ChatGPT Is a Blurry JPEG of the Web , The New Yorker, February 9, 2023.
Marcus Hutter, The Hutter Prize .
Marcus Hutter, Human Knowledge Compression Contest: Frequently Asked Questions & Answers .
Nelly Y. Pinkrah: On Words and Worlds , 2025.
Andrew R. Chow, How ChatGPT Managed to Grow Faster Than TikTok or Instagram , TIME, February 8, 2023.
Will Douglas Heaven, The inside story of how ChatGPT was built from the people who made it , TIME, March 3, 2023.
Matúš Pikuliak. ChatGPT Survey: Performance on NLP Datasets , Open Samizdat, March 3, 2023.
Ibid.
Center for Digital Narrative, Deconstructing the Endless Engagement Aesthetics of AI Platforms - Ben Grosser , YouTube, 2024.
OpenAI, Policy FAQ - How your data is used to improve model performance .