"Towards Scaled Bird Species Automatic Biacoustic Identification" Hervé Glotin (1,2,3), Hervé Goeau (4), Alexis Joly (4), Andreas Rauber (5), Willem-Pier Vellinga (6) (1) Aix Marseille Université, CNRS, ENSAM, LSIS UMR 7296, 13397 Marseille, France, glotin@univ-tln.fr (2) Université de Toulon, CNRS, LSIS UMR 7296, 83957 La Garde, France (3) Institut universitaire de France, 103 Bd St-Michel, 75005 Paris, France (4) Institut National de Recherche en Informatique et Automatique (INRIA),34000 Montpellier, France (5) TUWIEN, Wien, Austria (6) Xeno-Canto Foundation for Nature Sounds, The Netherlands contact : glotin@univ-tln.fr, Automatic bird species identification based on their song or call is one promising method for assessing biodiversity, but it requires improvement. Up to now, only four objective initiatives in the context of worldwide evaluation took place. The 1st one was the ICML4B bird challenge joint to the 2013 int'l Conf. on Machine Learning, initiated by univ. of Toulon within the framework of the CNRS MASTODONS SABIOD project (http://sabiod.org), with the Musée National d'Histoire Naturelle of Paris. It concerned 35 species, and 76 participants submitting 400 runs on the Kaggle interface [1]. The 2nd challenge was conducted by Arizona univ. at the MLSP 2013 workshop, with 15 species and 79 participants [2]. The 3rd challenge in 2013 was organized by univ. Toulon and Biotope, with 80 species from the Provence and 30 teams [3]. These challenges revealed original strategies, but were yet not scaled to large number of species. In collaboration with SABIOD, INRIA, TU and Xeno Canto, a new challenge opened in LifeClef 2014. It goes one step further by (i) significantly increasing the species number by almost an order of magnitude (ii) working on real-world social data contributed by hundreds of recordists (iii) moving to a more usage-driven and system-oriented benchmark by allowing the use of metadata and defining information retrieval oriented metrics [4]. Its corpus is composed of about 14k recordings of 501 species from Amazon forests. It is expected to be more challenging because of the high confusion risk between the classes, high background noise and high diversity in the acquisition conditions (devices, recordist customs, context diversity, etc.). It will therefore probably produce substantially lower scores and a better progression margin towards building real-world generalist identification tools. In this communication, we review the methods submitted to this last challenge and compare them to the previous ones, and we comment the difficulties and improvements that can be concluded for the next bird bioacoustic big data classification challenge. We acknowledge SABIOD MASTODON who participated to [1,2] and partly to [4] with others detailed in these references. [1] Glotin, Clark, LeCun, Dugan, Halkias, Sueur, 'Proc. of the 1st workshop on Machine Learning for Bioacoustics', joint to ICML, Atlanta, http://sabiod.org/, ISSN979-10-90821-02-6, june 2013 [2] Briggs, Raich, Eftaxias, Lei, Huang, 'The nith annual MLSP competition: Overview', in Proc. of the IEEE MLSP workshop, Southampton, UK, sept. 2013 [3] Glotin, LeCun, Artières, Mallat, Tchernichovski, Halkias, 'Proc. of Neural Information Processing Scaled for Bioacoustics: from Neurons to Big Data', joint to NIPS, http://sabiod.org/nips4b, ISSN 979-10-90821-04-0, dec. 2013 [4] Joly, Muller, Goeau, Glotin, Spampinato, Rauber, Bonnet, Vellinga, Fisher, Planque,'LifeCLEF: Multimedia Life Species Identification', in proc. of EMR Workshop joint to ACM ICML, Glasgow, april 2014 Subject :Talk Topics :Soundscape description ; Automatic classification Keywords :Bird species automatic identification ; Scaled bioacoustic classification ; Amazon forest