Models that modify Probability Distributions, I/O for Models via Web Protcols

Anudha Mittal
3 min readMar 5, 2025

LLMs are models that modify the probability distribution of a set of events, and predict the next event.

In this case, the event is the next token. The token can be a word or a subword.

The consideration around subwords allows us to account for vocabulary that is not in the original vocabulary list, allowing us to work with foreign words and proper nouns.

For the model named as GPT2LLMHead, the input can be a text phrase or a numerical list of token ids.

When the input is a list of token ids, the output, let’s call it x, has several attributes.

The logits are a probability distribution over every token that the model is trained on. Is a larger token size vocabulary brute forcing? Even if it's brute forcing, is it helpful? A larger token size means the probability distribution at each inference step is over more events, increase compute.

The number of total tokens used to train the LLM.

The maximum number of tokens in one forward pass.

ML Ops — Passing data to ML Models via internet protocols: http



import requests

# URL of a streaming or large audio file
audio_url = "https://sample-videos.com/audio/mp3/crowd-cheering.mp3"

response = requests.get(audio_url, stream=True)

# Save the streamed audio to a file
with open("streamed_audio.mp3", "wb") as f:
for chunk in response.iter_content(chunk_size=1024): # Process chunks of 1KB
if chunk:
f.write(chunk) # Save to file

print("Audio streaming complete. File saved as streamed_audio.mp3")

Above snippet shows how to work with streaming data using the requests library.

Load Balancing Algorithms in NGINX

Passing data from web server to application server:

HTTP, text-based, to FASTCGI (or alternative), binary protocol

Another protocol: Memcached

To use Memcached, install the software, and then install the python client, i.e. python interface. Memcached has only a few methods: get, set, delete, increment a value up, decrease a value by 1, and a couple others.

grpc is an http/2 based protocol:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response