Podcast Avatar

Google NotebookLM enables fast summarization of PDF files, websites, YouTube videos, audio files, Google Docs, and Google Slides, producing high-quality podcast audio in a two-person dialogue format. Podcast Avatar is an all-in-one AI video generation application developed by TalkingAvatar.ai for creating such two-person podcast-style audio. It allows users to easily and quickly combine their imported videos with two-person podcast-style audio to create new podcast videos.

Currently, Podcast Avatar is only available for use on the Windows app.

Operation Process

Podcast Avatar offers two methods for generating Podcast Videos:

Import a video with two people and a two-person podcast audio. The AI will diarize the speakers and match the audio to the corresponding video characters.
Import two single-person videos and a two-person podcast audio, with the AI matching the audio to the appropriate speakers.
In both cases, the workflow is similar for the user.

Step 1: Select the character video you want to edit and add it to the video track. You can either upload a local video or choose one from the free Library Avatar market provided by the platform.

Step 2: Upload the two-person podcast-style audio and add it to the audio timeline. Once added, the application will split the two-person podcast-style audio into two audio tracks based on speaker and automatically match the video characters that need to be synced. If the video contains multiple character avatars, you will need to select the avatar for each audio track.
(Note:The first time you upload an audio file, it may take longer for diarizing, so please be patient.)

Step 3: Additional configurations

AI model
- V1.3 and V1.4 are AI models used in older app versions 1.3 and 1.4
- V2.0 is the latest model
- V1.3 - Fast with good performance
- V1.4 - Fast with better performance
- V2.0 - Slower, but offers the best performance
- If you have sufficient computing power (such as a high-end Nvidia or AMD graphics card), we recommend using the latest V2.0 AI model.
Teeth enhancement
- Enhances teeth clarity, but may slow down processing.
Lip sync intensity
- Controls how widely the mouth opens during speech.
Lip sync smoothing
- Controls the smoothness of mouth movements during speech, enabling this reduces jitter or shakiness in lip motions.
Face orientation
- If the face is heavily tilted, it may affect lip sync accuracy.
- In such cases, selecting a large-angle processing mode can improve results.
- For best performance, use a frontal face orientation when possible.
Audio noise reduction
- Perform lip sync after reducing background noise from the audio.

When there is only one video on the video timeline, the application allows you to use the "Save as My Avatar" function. This saves the video along with its currently linked voice model as a quick preset. Note: Audio files on the audio timeline will not be saved. Once the preset is successfully saved, you can find it in the Avatar Library - My Avatar section of the application.

Important Notes

Video Guideline

Video Requirements
- Format: Import MP4 video files with a recommended resolution of at least 360p to ensure good final output quality.
- Lighting: Avoid strong light or shadows; ensure the subject's face is clearly visible.
- Content: Use stable footage with minimal shaking or rapid movements.
- Subjects: The subject's facial expressions in the video should be natural and easy to adapt.
Legal and Copyright Requirements
- Usage Authorization: Ensure that all video and audio materials used have proper legal authorization to avoid violating copyright or image rights.
- Privacy Protection: If the video involves real people or their voices, obtain prior consent or confirm the legal use of the material.
Operational Notes (App)
- Videos on the track can be split, trimmed, deleted, undone, or restored.
- When importing multiple videos onto the track, ensure they have the same resolution and aspect ratio. Additionally, using videos with a consistent frame rate is recommended to prevent stuttering or lag during output.
Common Issues and Solutions
- Issue 1: Shadows too dark around the face can lead to unstable lip-syncing.
- Issue 2: Poor clarity, obstructions, or side profile angles exceeding 20° may result in subpar output.
- Issue 3: Rapid body movements or continuous head shaking can negatively affect the output.
- Issue 4: Subjects with heavy facial hair may experience reduced lip-syncing accuracy.
- Issue 5: Subjects with very thick lips may result in poor lip-syncing performance.
- Issue 6: The application currently has limited support for cartoon or animated characters.
- Issue 7: The application does not currently support lip-syncing for animal characters.

Audio Guidelines

How to generator podcast audio from NotebookLM
- Google NotebookLM: https://notebooklm.google/
- Google NotebookLM tutorial: https://support.google.com/notebooklm#topic=16164070

Podcast Avatar

Operation Process

Podcast Avatar offers two methods for generating Podcast Videos:

Important Notes

Video Guideline

Video Requirements

Legal and Copyright Requirements

Operational Notes (App)

Common Issues and Solutions

Audio Guidelines

How to generator podcast audio from NotebookLM