SketchSpeakAI: Revolutionizing Presentation Creation for Online Meetings and Classes

João Pedro dos Santos
Dev Genius
Published in
3 min readMar 21, 2024

--

A while back, I worked on a cool project that brought computer vision into a drawing game. It was simple: you draw something with your hands in front of your webcam, and the game, powered by a neural network, guesses what it is. This experience opened my eyes to the potential of using webcam-detected hand gestures in different areas, especially education.

That’s where the idea for SketchSpeakAI came from. It’s a Python project that takes this concept further, letting your hands do the drawing and your voice do the talking, as AI turns your words into clear, concise PowerPoint slides.

Begin by moving your hands towards the “start recording” button. Once you press it, the system will capture everything you say until you hit “stop recording.”

For instance, if you discuss Basketball, specifically saying:

“Basketball mixes teamwork and talent, with players aiming to score by getting the ball through the hoop. The NBA, is the best basketball league in the world, has stars like Michael Joden, Lebron James, and Kobe Bryant.”

And drawing a basketball to visually complement your discussion:

The audio is then sent to OpenAI’s Whisper-1 model for transcription. Following this, OpenAI’s GPT-3.5-turbo generates a summary of your discussion, based on specific instructions provided.

prompt = f"I want you to summarize in three small topics what was said in this content and give a title for it, no yapping: {transcript}"
{"role": "system", "content": "Always translate your response to English language"}, {"role": "user", "content": prompt}

The “role = system” is used to set instructions for the model, separate from the actual message to be sent, while “role = user” refers to the message you’re inputting, as if you were conversing directly in the OpenAI chat.

Following these guidelines, GPT-3.5-turbo then generates a response, for instance:

Title: The NBA: Pinnacle of Basketball Excellence.

Basketball emphasizes teamwork and talent.

Players aim to score by getting the ball through the hoop.

The NBA, featuring stars like Michael Jordan and LeBron James, is the premier basketball league worldwide.

After receiving the response, it is formatted to fit the required text format for a PowerPoint slide. Along with your drawing, this process will produce a slide featuring a title, a summary of the audio, and the drawing.

This glimpse into SketchSpeakAI is just the beginning, and there’s plenty of room to grow and improve. The idea is to blend cool tech like computer vision and AI to simplify and enhance online meetings and classes, making them more engaging and intuitive for everyone involved.

If you want to know more about the code and how the project works, I recommend you to access the GitHub repository of the project.

I hope you found this article insightful, whether it resonated with you or sparked some curiosity. If you’re keen to dive deeper into the world of A.I or just want to connect, here’s how you can get in touch and follow my work:

  • Connect with me on LinkedIn for professional insights and discussions.
  • Visit my portfolio to see a comprehensive showcase of my work across different technologies and disciplines.
  • Follow me on Medium to read more of my articles on various topics.
  • Check out my GitHub to explore the projects I’m working on.

--

--