Authors: Edyan Antonio Cruz Vélez
Last major revision: 2024-02-16
The primary objective of ClassCorder is to provide a streamlined audio recording and transcription service tailored for students and professors. The recorded audio will be converted into searchable text, and Gen AI will generate enhanced metadata (titles, summaries, and images). Lectures will appear in a list with options to navigate to different pages such as the chatbot or quiz pages.
In university life, recording lectures for future reference is common, be them for students with disabilities or learning differences needing additional processing time, or for students who need more flexible study schedules or prefer to learn at their own pace. However, navigating long audio recordings can be tedious, especially for detailed reviews. The added time of searching through an audio-only recording wastes time that can be used for studying instead. They also do not help people who are hard of hearing, as they will not be able to fully understand the lecture through the recording.
Ability to record audio lectures from within the page, while also allowing audio file uploads. Pass recorded audio files to Google's Speech-To-Text service to create a text transcript from them, while providing an option to choose the lecture language. Once the transcript has been received, pass to Gemini Pro to generate a title, summary, and a prompt for Imagen to also generate a thumbnail for the lecture.
Lecture recordings, transcripts, and the thumbnail will all be stored in Google BigQuery with user authentication required for access. From the data in the database, a list of recordings will be shown to the user with all the data associated with it, along with buttons that send the transcript to a different page such as the chatbot or quiz page. Search functionality based on lecture data should be able for ease of search.
The webpage prioritizes efficient organization and search functionality. A prominent search bar enables quick retrieval of specific lectures, while visually engaging AI-generated image thumbnails aid browsing. Lectures are listed on the page alongside key information including title, summary, and date. The central focus remains on the embedded recording accompanied by a full, searchable transcript synchronized with the recording for easy navigation. To ease the burden on users, the platform should feature a simple recording interface along with an option to upload audio from a local source. Upon uploading the recording, the language of the recording needs to be set before the automated transcription process begins. After this process is successful, users may then either view the transcript or use it on the different pages presented.
An option to upload video has been considered for addition. However, video takes up far more space than audio, and currently available AI models also struggle to analyze video files longer than 2 minutes. While we could just strip the audio from the video and only use that instead, it remains out of scope as it still involves additional processing, and the preferred recording method is to record audio from within the page.
One other Speech-To-Text model which has been considered for usage is OpenAI Whisper, which is fully open source and costs less per minute transcribed than Google's Speech-To-Text service. However, Google's service supports more languages and dialects and is better integrated with other Google Cloud services like BigQuery and Vertex AI.
Generate a design doc template for a page about recording lectures, transcribing them to text, and using Gen AI to create a title, summary, and image for each lecture
Objective
Objective is clear and well defined.
Requirements
Requirements are well structured, clearly defined, relevant, and realistic
Background
The context of the problem is well defined, understandable, and relevant
Design Ideas
The ideas are fully conceptualized, relevant, and feasible
Alternatives Considered
Alternative solutions were developed
turn this into a paragraph
Webpage Layout
Prominent search bar.
Listing of lectures using AI-generated image thumbnails.
Organized display of lecture information: title, summary, date (filter options)
Embedded recording (audio/video).
Full transcript (searchable/time-synced with recording).
Simple recording interface within the platform or guidance on a recommended tool.
Designated storage location for recordings.
Automated triggering of transcription upon upload.
API connection to the chosen AI model.
User-adjustable settings for tailoring AI output (e.g., length of summaries)