Words related to questions

2 min readDec 26, 2024

question_words = [

“where”, “how”, “what”, “when”, “why”, “who”, “whom”, “whose”, “which”, “whether”,

“can”, “should”, “would”, “could”, “will”, “may”, “might”, “did”, “do”, “does”,

“explain”, “define”, “elaborate”, “clarify”, “specify”, “describe”, “interpret”, “analyze”, “identify”, “outline”,

“how much”, “how many”, “how often”, “how long”, “how far”, “to what extent”, “at what rate”, “what percentage”, “what amount”, “what time”,

“suppose”, “imagine”, “consider”, “assume”, “hypothesize”, “predict”, “estimate”, “anticipate”, “project”, “infer”,

“why not”, “because”, “due to”, “what causes”, “what results”, “what happens”, “reason”, “consequence”, “impact”, “outcome”,

“which one”, “either”, “neither”, “what if”, “how about”, “should I”, “is it”, “can I”, “do we”, “will it”,

“why do you think”, “how do you feel”, “what’s your opinion”, “what’s your perspective”, “what do you believe”, “what do you recommend”,

“how would you rate”, “what’s better”, “what’s worse”, “how important”,

“how to”, “what’s the solution”, “what’s the best way”, “what options”, “how can I fix”, “what’s the issue”, “why doesn’t”, “what went wrong”,

“how do we solve”, “how to improve”, “what else”, “what’s new”, “what’s next”, “what’s missing”, “where else”, “how do we start”,

“what’s out there”, “how to begin”, “where to look”, “how to explore”

]

the list is shuffled to make another list
the two lists are paired. (another way would be to compare every word with every word, and make a heat map)

Results:

there is one instance of a similarity of 1. this is because one term happens to match up with itself

Discussion

Visually the distributions are somewhat similar.

all-miniLM-L6-v2 allows negative dot products.

all-miniLM-L6-v2 has a tighter cluster.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Vector Embeddings

Histograms

Written by Anudha Mittal

5 Followers

43 Following

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Data Science Collective

Paolo Perrone

The 101 Guide to the Modern Data Stack

We’ve reached the final stage of our deep dive into the modern data stack — your go-to guide for navigating the data landscape as a…

5d ago

Predict

Will Lockett

This Is How Tesla Will Die

The vultures are circling the tech giant.

5d ago

134

Lists

Staff picks

826 stories1649 saves

Stories to Help You Level-Up at Work

19 stories948 saves

Self-Improvement 101

20 stories3355 saves

Productivity 101

20 stories2818 saves

diagram illustrating the balance of different components in answering behavioral interview questions

Bootcamp

Rita Kind-Envy

I finally understand what FAANG wants in a candidate

6 rules on “how to tango” in the interviews which got me the job.

Mar 3

🚅 Information Theory for People in a Hurry

Towards AI

Eyal Kazin PhD

🚅 Information Theory for People in a Hurry

A quick guide to Entropy, Cross-Entropy and KL Divergence. Python code provided. 🐍

6d ago

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Level Up Coding

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

260

20 Cutting-Edge Statistical Techniques Every Data Scientist Should Master in 2025

The Data Beast

20 Cutting-Edge Statistical Techniques Every Data Scientist Should Master in 2025

In today’s fast-paced data world, traditional methods are evolving rapidly. In 2025, the fusion of classical statistics, AI, and modern…