add option to provide audio samples for prediction by marypilataki · Pull Request #153 · spotify/basic-pitch

marypilataki · 2024-11-20T12:42:52Z

User can either provide path of audio file or an array of audio samples for prediction. Can be useful when we want to transcribe an excerpt of an audio file or when audio loading is done in some other part of our code.

marlonwq

It seems good. Suggestion: You could comment the code.

hyperc54

Hi!
Thanks for the PR, and sorry for the delay in replying
It looks great and would also resolve some asks made in #55

I only added a couple nits but this looks great overall, I'd also appreciate if you could add a unit-test before we merge anything.

I appreciate that given we took a long time to reply you might not be available anymore to iterate on this. I'm happy to help resolving the suggestions I made, just let me know!

hyperc54 · 2025-10-29T13:35:44Z

basic_pitch/inference.py

 def predict(
-    audio_path: Union[pathlib.Path, str],
+    audio_path_or_array: Union[pathlib.Path, str, np.ndarray],
+    sample_rate: int = None,


nit: use Optional[int] type

hyperc54 · 2025-10-29T13:37:59Z

basic_pitch/inference.py

    melodia_trick: bool = True,
    debug_file: Optional[pathlib.Path] = None,
    midi_tempo: float = 120,
+    verbose: bool = False


nit: It's a good idea, although basic pitch is already pretty verbose by default. I think it is fine in this PR to add a few logs lines without needing to control these with a new verbose parameter. We can think of controlling the verbosity in future PRs in my opinion :)

hyperc54 · 2025-10-29T13:39:22Z

basic_pitch/inference.py

    Args:
-        audio_path: File path for the audio to run inference on.
+        audio_path_or_array: File path for the audio to run inference on or array of audio samples.
+        sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array.


Suggested change

sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array.

sample_rate: Mandatory if audio_path_or_array is a np array. it should represent the sample rate of the provided array. Ignored if `audio_path_or_array` is a string

nit

hyperc54 · 2025-10-29T13:39:39Z

basic_pitch/inference.py

 def run_inference(
-    audio_path: Union[pathlib.Path, str],
+    audio_path_or_array: Union[pathlib.Path, str, np.ndarray],
+    sample_rate: None,


nit: use Optional[int] type

hyperc54 · 2025-10-29T13:43:49Z

basic_pitch/inference.py


    Args:
-        audio_path: File path for the audio to run inference on.
+        audio_path_or_array: File path for the audio to run inference on or array of audio samples.


nit: It would be great to add information on the expected input shape, and enforce it right at the beginning of the method. It looks like you're merging channels if multiple are provided, it could be worth adding a note about that in the docstring

hyperc54 · 2025-10-29T13:44:27Z

basic_pitch/inference.py

+        audio_path_or_array: The audio to run inference on. Can be either the path to an audio file or a numpy array of audio samples.
+        sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array.


the same docstring comments apply here too

hyperc54 · 2025-10-29T13:45:23Z

basic_pitch/inference.py

    with no_tf_warnings():
-        print(f"Predicting MIDI for {audio_path}...")
+        if isinstance(audio_path_or_array, np.ndarray) and verbose:
+            print("Predicting MIDI ...")


Suggested change

print("Predicting MIDI ...")

print("Predicting MIDI for input audio array of shape XX")

add option to provide audio samples for prediction

35c8d57

marlonwq suggested changes Nov 29, 2024

View reviewed changes

add comments

f5b5dc9

marlonwq approved these changes Dec 2, 2024

View reviewed changes

marypilataki added 2 commits December 4, 2024 12:19

update readme

450b8ff

add verbose param for printing

2492048

hyperc54 reviewed Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add option to provide audio samples for prediction#153

add option to provide audio samples for prediction#153
marypilataki wants to merge 4 commits intospotify:mainfrom
marypilataki:audio_samples_for_prediction

marypilataki commented Nov 20, 2024

Uh oh!

marlonwq left a comment

Uh oh!

hyperc54 left a comment

Uh oh!

hyperc54 Oct 29, 2025

Uh oh!

hyperc54 Oct 29, 2025

Uh oh!

hyperc54 Oct 29, 2025

Uh oh!

hyperc54 Oct 29, 2025

Uh oh!

hyperc54 Oct 29, 2025

Uh oh!

hyperc54 Oct 29, 2025

Uh oh!

hyperc54 Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array.
	sample_rate: Mandatory if audio_path_or_array is a np array. it should represent the sample rate of the provided array. Ignored if `audio_path_or_array` is a string

		audio_path_or_array: The audio to run inference on. Can be either the path to an audio file or a numpy array of audio samples.
		sample_rate: Sample rate of the audio file. Only used if audio_path_or_array is a np array.

	print("Predicting MIDI ...")
	print("Predicting MIDI for input audio array of shape XX")

Conversation

marypilataki commented Nov 20, 2024

Uh oh!

marlonwq left a comment

Choose a reason for hiding this comment

Uh oh!

hyperc54 left a comment

Choose a reason for hiding this comment

Uh oh!

hyperc54 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

hyperc54 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

hyperc54 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

hyperc54 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

hyperc54 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

hyperc54 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

hyperc54 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants