Replies: 8 comments 3 replies
-
Hi @ceperman , I formatted your table, as I couldn't easily read it. To be honest, as this is a comparison between BirdNET Analyzer and Merlin: the best place for feedback about your experience is https://github.com/kahst/BirdNET-Analyzer/discussions Regarding your last paragraph, Chirpity has two Nocmig models and BirdNET available for detection, but it does not use a BirdNET web service (there isn't one, to my knowledge*). When using the BirdNET option, it uses the same model as BirdNET Analyser (v2.4) ported to JavaScript. It can be run offline and should give identical results. If it doesn't you can raise a bug report - please share the audio file if you do. '* BirdNET is available here: https://birdnet.cornell.edu/api/, but this is indeed a very old version of BirdNET, and despite it having /api in the URL, I don't think the endpoints are documented. |
Beta Was this translation helpful? Give feedback.
-
Hi @Mattk70, in as much as you can see this as a comparison between BirdNET and Merlin, you're right that it probably belongs elsewhere, and I may do that. I thought it would be interesting for Chirpity/BirdNET users to understand that when it comes to AI identification, opinions differ even between AIs from the same stable. Perhaps more significant is the difference between BirdNET and what I could hear. I'm no expert but know a blackbird and pheasant when I hear one; BirdNET did pick them up but only with a low confidence (around 0.2). In order to catch these birds and others that were clearly present I would have to lower the Chirpity confident limit to 0.2, which would pick up many false positives and give me a lot more work eliminating them. Chirpity obviously does make this a lot easier to do, but it would operating at a much lower confidence level than would usually be recommended. At a more usual level of 0.7 I would have only picked up the Green Woodpecker and missed the other 8 species that were present. I'm not sure where this leaves us, other than pointing out that using a high confidence level means eliminating false positives but also missing false negatives. To repeat what I said earlier, my gut feel is that BirdNET is much better at identifying isolated birds than when all jumbled together, as in a dawn chorus. So in the latter case, use a lower confidence level. Re. what I said about Chirpity using the BirdNET web service, I know that BirdNET has a web interface and admittedly I was guessing that you used this somehow purely because Chirpity is getting different results from my desktop version. But as far as I can tell I'm also using v2.4 (BirdNET-Analyzer doesn't have a -version option but 2.4 is mentioned in the Readme.adoc file) so I don't know why I'm getting different results. I've included the files in question (mp3 files don't appear to be supported so I've zipped them). Dawn chorus: |
Beta Was this translation helpful? Give feedback.
-
Thanks @ceperman , I think it's generally accepted that all AI models struggle when presented with a busy soundscape of overlapping species' calls. The very best performing ensemble models, such as those that win BirdClef achieve < 70% accuracy even after deploying many bespoke tricks to optimise for the test sounds. I am not sure of BirdNET's v2.4 model "soundscape" accuracy but I suspect it will be closer to 50% (maybe @kahst can comment?) I looked at both files in Chirpity vs. BirdNET and can account for the different results. The main difference is that Chirpity does not report all the detections at a specific timecode, only the top one (You can see these if you look at the results for a single species - there will be a clickable grey circle indicating there are additional results) . The other difference you will notice is that the reported confidence differs very slightly. This is due slightly different floating point rounding errors when your files are resampled to fit BirdNET's expected 48K. One of your file's rates is 44.1KHz, the other an unusual 46KHz. Neither give nice decimals when you convert them (e.g. 48/44.1 = 1.0884353741...) Python's resampling algorithm differs from that in ffmpeg - so they round in slightly different ways. For practical purposes it does not make a difference, and you don't see any difference at all if you use a sample rate that divides nicely into 48. I've shown in the table below how that affects the results:
|
Beta Was this translation helpful? Give feedback.
-
@Mattk70 Thanks for the extremely detailed response and analysis. I've still a lot to learn about Chirpity! FYI the file sampling rates. The 44.1KHz file was created using a commercial mp3 recorder, and this rate is fairly typical for CD quality recording. The other was from my home-grown recorder, which creates WAV format at 24KHz, 16 bits, mono. Because they are smaller, I attached mp3 versions that I'd created some while ago by exporting from Audacity, using its default export values. When using Chirpity, I process the WAV files. Did you get to look at the Barn Owl file 03395116.mp3? I still see a significant difference in the owl detection between BirdNET used as a command (77%) and via Chirpity (48%). BTW You said "...resampled to fit BirdNET's expected 48K...". I'm not aware of this requirement. When I use BirdNET I don't do anything special with the input files. Can you explain? |
Beta Was this translation helpful? Give feedback.
-
I did look at the Barn Owl file. If I run the mp3 file in BirdNET Analyzer, it picks up the Barn Owl at 47% I can see from the BirdNET csv that you got 77% when analysing the original wav file. Is this a co-incidence, or did you compare predictions from the WAV in BNA to predictions from the mp3 in Chirpity? If you think the wav file shows a discrepancy, maybe share the wav file? Re resampling: BirdNET requires audio with a 48KHz sample rate. Both Chirpity and BirdNET Analyzer applications resample the audio internally to match that. As a side note, I misread the properties of the barn owl file you shared. 46Kbps is the bitrate, the sample rate is actually 24KHz. A file this heavily compressed has a lot of compression artefacts, and doesn't provide the full frequency range used by BirdNET (0-15KhZ) for its predictions. I suspect this is the reason the results from the WAV and the mp3 differ so much. An entirely different possibility is that you had applied audio filters in Chirpity and enabled "send filtered audio for analysis" in the settings. This will definitely result in differences. |
Beta Was this translation helpful? Give feedback.
-
I used the WAV file for both, no filters. I've attached it. I did a confidence comparison with Chirpity and BNA with the Barn Owl file, using (a) the original WAV file (b) the Audacity mp3 export I posted previously and (c) the WAV exported from Audacity to mp3, 24KHz mono:
The last file was created for interest. Even though the WAV file is 24KHz mono, as equivalent export is obviously of such poor quality that the Barn Owl is lost. I said originally that the owl call was so faint that I couldn't even hear it, so perhaps it's understandable. You mention that BirdNET uses up to 15KHz for analysis. For my recorder I've implemented a 24KHz sampling rate, mostly to keep the file sizes small, which gives me an upper frequency of around 11KHz. I'd looked at sample sonograms and didn't see birds producing sounds above this. Are you suggesting it's not high enough? I can increase my sampling rate, obviously at the expense of file size. This is a consideration because, depending on the battery capacity, I might have the recorder deployed for weeks at a time, and then SD card capacity may be a limiting factor. |
Beta Was this translation helpful? Give feedback.
-
I've just analysed some of my recent 24KHz files using both BirdNET-Analyzer and Chirpity and got significantly different results. Reading what you said about 48KHz files, I looked at the BNA docs on GitHub. It doesn't have a requirement for 48KHz input but does say: Model V2.4 uses the following settings: 48 kHz sampling rate (we up- and downsample automatically and can deal with artifacts (sic) from lower sampling rates). So both Chirpity and BNA will resample my 24KHz files to 48KHz. What if they resample differently? I resampled a test file myself to 48KHz using ffmpeg and had Chirpity and BNA analyse it. Guess what? - both gave (virtually) identical results. What's more, the Chirpity results were the same for both the 24KHz and 48KHz files, which isn't surprising since we resampled in the same way. But it does mean that the reason for the differences is that BNA would appear to use a different upsampling mechanism from Chirpity/ffmpeg. Moreover, the Chirpity results are far more convincing. In the sample I used, there was just one bird, a Song Thrush singing many times. Chirpity gave multiple Song Thrush detections, plus a few outliers which I could eliminate. BNA gave just one detection - a Tawny Owl, which could be tracked down to the first couple of notes of a Song Thrush song (to be fair, it did sound quite like a Tawny Owl and I've attached it). No mention of a Song Thrush. My lessons from this are (a) if possible I will increase the sampling rate of my recorder to 48KHz, (b) if using lower sampling rate files I will only analyse them through Chirpity or (c) for use with BNA directly, will resample them to 48KHz with ffmpeg myself. |
Beta Was this translation helpful? Give feedback.
-
I've just seen your post in "Why are the Chirpity results from BirdNET sometimes different from those given when using BirdNET Analyser?"! - although I'm not sure it explains why I saw such wildly different results. |
Beta Was this translation helpful? Give feedback.
-
I've build myself a sound recorder that I can deploy remotely to record birds over a period of days or weeks. It alternates between recording and sleeping (both periods configurable), creating files on an SD card which I can then analyse back home using BirdNET-Analyzer which I have locally installed on my PC ie. not the web version. I'm going to be relying on BN to do the identifications, and I'm looking at Chirpity to see how it can enhance the analysis process.
BirdNET-Analyzer appears to give credible results, that is until I compare them with Merlin (the phone app I use when I'm out and about) and sometimes the Mk1 ear. I'm interested to know how others feel about its accuracy and if they have their own comparisons.
To put the following examples in context, I live in Warwickshire, UK
First example
I've made some dawn chorus recordings over the years for a local website, and identified what I could by ear (this was before AI identification was around, or at least before I discovered it). Having found Merlin and BirdNET I analysed some of them for interest.
See (or hear!) https://www.oakleywood.org.uk/2020/05/dawn-chorus-2020/ (the second recording)
This recording was made with the equipment mentioned on the web page.
I played it with Merlin listening, and also ran it through BirdNET-Analyzer. Merlin largely agreed with what I could hear; BN at 0.7 confidence detected just the Green Woodpecker at 33 secs in. I'm not doubting this identification (confidence 0.9320), I was surprised at everything it missed. This is the comparison table I made:
Admittedly I was using a very low condifence threshold for BN (this was before I knew what sort of level I should typically be using, perhaps nothing less than 0.7).
Second example
This was made recently with my home-grown still-in-development recorder in a rural garden, making a 3 minute recording every 30 minutes. These are the BN results (xn = number of detections in the 3 min period):
Time slot Species
15:10 Blue Tit, Great Tit(x2) (Merlin: same)
15:39 Robin(x2) (Merlin: Blue Tit, Robin)
16:09 Robin (Merlin: same)
20:33 Tawny Owl(x2) (Merlin: nothing, although I could clearly hear it)
03:54 Barn Owl (Merlin: nothing, I couldn't hear it)
07:49 Robin(x21), Redwing, Dunnock, Long-tailed Tit (Merlin: Robin)
08:18 Wren, Robin(x3) (Merlin: Great Tit, Blue Tit, Robin, Chaffinch)
08:47 Robin(x6) (Merlin: Robin, Great Tit, Blue Tit, Great Spotted Woodpecker)
09:17 Dunnock, Robin (Merlin: Robin, Greenfinch, I only heard a Pheasant!)
09:46 Robin(x16), Pheasant (Merlin: Robin, Greenfinch, Blackbird)
10:16 Robin(x3)
10:45 Robin(x3), Great Tit(x8), Long-tailed Tit
11:15 Robin(x2)
11:44 Robin(x30), Blue Tit, Dunnock, Water Rail(!!*) (Merlin: Robin, Blue Tit, Dunnock)
Over-high gain (perhaps) in the recorder created crackle and disortion of close/loud sounds, which may have accounted for the unexpected Water Rail. Apart from this, the BN results are all quite credible.
However, when compared with Merlin identifications, it all looks a bit uncertain. Which do you believe? Both these products come from the same stable (Cornell Labs) but my understanding is that they use different AI implementations. I've used Merlin for some while now and come to have confidence in its identifications. It alerted me to the presence of Spotted Flycatchers in our local wood. Initially I didn't believe it, but it was persistent in a particular place where I eventually spotted (sic) them.
So, where does that leave me? I know that these products are not infallible, but the disagreement between them is disappointing. My feeling is that Merlin may be better at dealing with overlapping sounds, as in the dawn chorus; BN happier with descrete sounds. I obviously want to have confidence in BN because that's what I will use for my recordings.
One more thing. I think Chirpity uses the BirdNET web service. I use the desktop version and detects the Barn Owl (above) with 77% confidence level. I ran the recording through Chirpity and it detected nothing at 70% confidence level, but I dropped it to 40% and then it did detect it, at 48%. Why is there a difference? This is confusing!
Beta Was this translation helpful? Give feedback.
All reactions