Words related to questions
question_words = [
“where”, “how”, “what”, “when”, “why”, “who”, “whom”, “whose”, “which”, “whether”,
“can”, “should”, “would”, “could”, “will”, “may”, “might”, “did”, “do”, “does”,
“explain”, “define”, “elaborate”, “clarify”, “specify”, “describe”, “interpret”, “analyze”, “identify”, “outline”,
“how much”, “how many”, “how often”, “how long”, “how far”, “to what extent”, “at what rate”, “what percentage”, “what amount”, “what time”,
“suppose”, “imagine”, “consider”, “assume”, “hypothesize”, “predict”, “estimate”, “anticipate”, “project”, “infer”,
“why not”, “because”, “due to”, “what causes”, “what results”, “what happens”, “reason”, “consequence”, “impact”, “outcome”,
“which one”, “either”, “neither”, “what if”, “how about”, “should I”, “is it”, “can I”, “do we”, “will it”,
“why do you think”, “how do you feel”, “what’s your opinion”, “what’s your perspective”, “what do you believe”, “what do you recommend”,
“how would you rate”, “what’s better”, “what’s worse”, “how important”,
“how to”, “what’s the solution”, “what’s the best way”, “what options”, “how can I fix”, “what’s the issue”, “why doesn’t”, “what went wrong”,
“how do we solve”, “how to improve”, “what else”, “what’s new”, “what’s next”, “what’s missing”, “where else”, “how do we start”,
“what’s out there”, “how to begin”, “where to look”, “how to explore”
]
- the list is shuffled to make another list
- the two lists are paired. (another way would be to compare every word with every word, and make a heat map)
Results:
- there is one instance of a similarity of 1. this is because one term happens to match up with itself


Discussion
Visually the distributions are somewhat similar.
all-miniLM-L6-v2 allows negative dot products.
all-miniLM-L6-v2 has a tighter cluster.