Create Audio Instructions Based on Video?
Can Gen-AI create voice instructions on how to do something?
It’s easier to record a video of a task: change a flat tire, juggle, demo yoga — without also speaking.
Mediapipe can detect the movement someone is carrying out. Can another model use this inference to generate audio instructions on movement?
Link to videos with movement detection:
Original video: https://www.youtube.com/shorts/DnDH4OshzB4
Another example with a juggling video:
Original video: https://www.youtube.com/shorts/czM2Tib4QAU
Code: https://github.com/Anudha/Yoga/blob/master/MediaPipe_ForYogaPose_GenAI_Public.ipynb
