add option to provide audio samples for prediction#153
add option to provide audio samples for prediction#153marypilataki wants to merge 4 commits intospotify:mainfrom
Conversation
marypilataki
commented
Nov 20, 2024
- User can either provide path of audio file or an array of audio samples for prediction. Can be useful when we want to transcribe an excerpt of an audio file or when audio loading is done in some other part of our code.
marlonwq
left a comment
There was a problem hiding this comment.
It seems good. Suggestion: You could comment the code.
hyperc54
left a comment
There was a problem hiding this comment.
Hi!
Thanks for the PR, and sorry for the delay in replying
It looks great and would also resolve some asks made in #55
I only added a couple nits but this looks great overall, I'd also appreciate if you could add a unit-test before we merge anything.
I appreciate that given we took a long time to reply you might not be available anymore to iterate on this. I'm happy to help resolving the suggestions I made, just let me know!
| def predict( | ||
| audio_path: Union[pathlib.Path, str], | ||
| audio_path_or_array: Union[pathlib.Path, str, np.ndarray], | ||
| sample_rate: int = None, |
There was a problem hiding this comment.
nit: use Optional[int] type
| melodia_trick: bool = True, | ||
| debug_file: Optional[pathlib.Path] = None, | ||
| midi_tempo: float = 120, | ||
| verbose: bool = False |
There was a problem hiding this comment.
nit: It's a good idea, although basic pitch is already pretty verbose by default. I think it is fine in this PR to add a few logs lines without needing to control these with a new verbose parameter. We can think of controlling the verbosity in future PRs in my opinion :)
| Args: | ||
| audio_path: File path for the audio to run inference on. | ||
| audio_path_or_array: File path for the audio to run inference on or array of audio samples. | ||
| sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array. |
There was a problem hiding this comment.
| sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array. | |
| sample_rate: Mandatory if audio_path_or_array is a np array. it should represent the sample rate of the provided array. Ignored if `audio_path_or_array` is a string |
nit
| def run_inference( | ||
| audio_path: Union[pathlib.Path, str], | ||
| audio_path_or_array: Union[pathlib.Path, str, np.ndarray], | ||
| sample_rate: None, |
There was a problem hiding this comment.
nit: use Optional[int] type
|
|
||
| Args: | ||
| audio_path: File path for the audio to run inference on. | ||
| audio_path_or_array: File path for the audio to run inference on or array of audio samples. |
There was a problem hiding this comment.
nit: It would be great to add information on the expected input shape, and enforce it right at the beginning of the method. It looks like you're merging channels if multiple are provided, it could be worth adding a note about that in the docstring
| audio_path_or_array: The audio to run inference on. Can be either the path to an audio file or a numpy array of audio samples. | ||
| sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array. |
There was a problem hiding this comment.
the same docstring comments apply here too
| with no_tf_warnings(): | ||
| print(f"Predicting MIDI for {audio_path}...") | ||
| if isinstance(audio_path_or_array, np.ndarray) and verbose: | ||
| print("Predicting MIDI ...") |
There was a problem hiding this comment.
| print("Predicting MIDI ...") | |
| print("Predicting MIDI for input audio array of shape XX") |