From 1 - 2 / 2
  • Here, we provide plankton image data that was sorted with the web applications EcoTaxa and MorphoCluster. The data set was used for image classification tasks as described in Schröder et. al (in preparation) and does not include any geospatial or temporal meta-data. Plankton was imaged using the Underwater Vision Profiler 5 (Picheral et al. 2010) in various regions of the world's oceans between 2012-10-24 and 2017-08-08. This data publication consists of an archive containing  "training.csv" (list of 392k training images for classification, validated using EcoTaxa), "validation.csv" (list of 196k validation images for classification, validated using EcoTaxa), "unlabeld.csv" (list of 1M unlabeled images), "morphocluster.csv" (1.2M objects validated using MorphoCluster, a subset of "unlabeled.csv" and "validation.csv") and the image files themselves. The CSV files each contain the columns "object_id" (a unique ID), "image_fn" (the relative filename), and "label" (the assigned name). The training and validation sets were sorted into 65 classes using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). This data shows a severe class imbalance; the 10% most populated classes contain more than 80% of the objects and the class sizes span four orders of magnitude. The validation set and a set of additional 1M unlabeled images were sorted during the first trial of MorphoCluster (https://github.com/morphocluster). The images in this data set were sampled during RV Meteor cruises M92, M93, M96, M97, M98, M105, M106, M107, M108, M116, M119, M121, M130, M131, M135, M136, M137 and M138, during RV Maria S Merian cruises MSM22, MSM23, MSM40 and MSM49, during the RV Polarstern cruise PS88b and during the FLUXES1 experiment with RV Sarmiento de Gamboa. The following people have contributed to the sorting of the image data on EcoTaxa: Rainer Kiko, Tristan Biard, Benjamin Blanc, Svenja Christiansen, Justine Courboules, Charlotte Eich, Jannik Faustmann, Christine Gawinski, Augustin Lafond, Aakash Panchal, Marc Picheral, Akanksha Singh and Helena Hauss In Schröder et al. (in preparation), the training set serves as a source for knowledge transfer in the training of the feature extractor. The classification using MorphoCluster was conducted by Rainer Kiko. Used labels are operational and not yet matched to respective EcoTaxa classes.

  • Plankton was sampled with various nets, from bottom or 500m depth to the surface, in many oceans of the world. Samples were imaged with a ZooScan. The full images were processed with ZooProcess which generated regions of interest (ROIs) around each individual object and a set of associated features measured on the object (see Gorsky et al 2010 for more information). The same objects were re-processed to compute features with the scikit-image toolbox (http://scikit-image.org). The 1,433,278 resulting objects were sorted by a limited number of operators, following a common taxonomic guide, into 93 taxa, using the web application EcoTaxa (http://ecotaxa.obs-vlfr.fr). The archive contains: taxa.csv.gz Table of the classification of each object in the dataset, with columns - objid: unique object identifier in EcoTaxa (integer number). - taxon: taxonomic name. Ambiguous names are made unique by including the name of the parent taxon in parentheses, after the name of the taxon. - lineage: full taxonomic lineage corresponding to this taxon. features_native.csv.gz Table of morphological features computed by ZooProcess. All features are computed on the object only, not the background. All area/length measures are in pixels. All grey levels are in encoded in 8 bits (0=black, 255=white). With columns - objid: same as above - area: area - mean: mean grey - stddev: standard deviation of greys - mode: modal grey - min: minimum grey - max: maximum grey - perim.: perimeter - width,height dimensions - major,minor: length of major,minor axis of the best fitting ellipse - circ.: circularity: 4pi(area/perim.^2) - feret: maximal feret diameter - intden: integrated density: mean*area - median: median grey - skew,kurt: skewness,kurtosis of the histogram of greys - %area: proportion of the image corresponding to the object - area_exc: area excluding holes - fractal: fractal dimension of the perimeter - skelarea: area of the one-pixel wide skeleton of the image - slope: slope of the cumulated histogram of greys - histcum1,2,3: grey level at quantiles 0.25, 0.5, 0.75 of the histogram of greys - nb1,2,3: number of objects after thresholding at the grey levels above - symetrieh,symetriev: index of horizontal,vertical symmetry - symetriehc,symetrievc: same but after thresholding at level histcum1 - convperim,convarea: perimeter,area of the convex hull of the object - fcons: contrast - thickr: thickness ratio: maximum thickness/mean thickness - elongation: elongation index: major/minor - range: range of greys: max-min - meanpos: relative position of the mean grey: (max-mean)/range - cv: coefficient of variation of greys: 100*(stddev/mean) - sr: index of variation of greys: 100*(stddev/range) - perimferet: index of the relative complexity of the perimeter: perim/feret - perimmajor: index of the relative complexity of the perimeter: perim/major features_skimage.csv.gz Table of morphological features recomputed with skimage.measure.regionprops on the ROIs produced by ZooProcess. See http://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops for documentation. inventory.txt Tree view of the taxonomy and number of images in each taxon, displayed as text. map.png Map of the sampling locations, to give an idea of the diversity sampled in this dataset. imgs Directory containing images of each object, named according to the object id objid and sorted in subdirectories according to their taxon.