Understanding How GenAI Works
Generative AI is the common name for techniques to create new outputs from machine learning models. Generative applications and models are distinguished from other techniques that are designed to classify sources or are merely predictive. Generative AI or GenAI applications can create new outputs like text, images, sounds, or video. These are frequently called synthetic, as in synthetic text or video. GenAI applications are built on special-purpose fine-tuned large, predictive models, some of which are known as large language models or LLMs. These models are artificial neural networks with many billions (or more) of parameters.
GenAI and chatbots are not knowledge bases and they are not search engines; they can be used to retrieve information from sources of knowledge but they are not designed to accurately answer queries. As instruction fine-tuned models, they are designed to respond to conversational prompts from users. These responses can be remarkable: they can reproduce many familiar genres and transform and manipulate inputs. When prompted for factual information, they may generate correct information or plausible knowledge, but they may also output factually incorrect information.
For text applications, GenAI begins to construct its outputs by way of your input, the prompt. This prompt is wrapped up with some other instructions (system prompt) and fed into the neural network. The outputs are called stochastic because they are iteratively selected from probability values that are returned for each new token (a word or part of a word called a subword). This means that normally you will see slightly different outputs for the same prompt. These outputs are the result of pretraining on very large amounts of text and fine-tuning on many samples of outputs that were preferred by users and the creators of the model.
GenAI applications are best understood as pipelines: a modular and layered technological system made up of multiple components that process input and generate output for the next stage of processing. Your inputs are processed through these multiple stages, some of which are quite opaque and not exposed to users, as part of the GenAI pipeline. What are some of the major components in a contemporary GenAI pipeline? The following diagram illustrates several of these components.
genai-pipeline-01.jpg

Inputs, including system and user prompts (as well as prior responses from the model), are fed into the neural network and outputs are typically generated one token (a word or a piece of a word) at a time. The generation of a token involves the prediction of probability values and the selection from among these values, sometimes the most probable next token but many times from further down the range of probable, which is to say, most likely, tokens. These probabilities are predicted from the many, many samples of token sequences found in training and fine tuning data.