Datasets

  • Voco Female Swahili

    There is no description for this dataset

  • Makerere Luganda Agricultural Text Data

    The dataset consists of sentences in the Luganda language that solely pertain to the agricultural domain. These sentences cover a wide range of topics within agriculture, such...
  • Coffee and Cashew Nut Dataset

    Our research focuses on using machine learning and drone technology to improve yield estimation in agriculture. We introduce the "Coffee and Cashew Nut Dataset," containing...
  • Makerere University Beans Image Dataset

    This beans dataset was created to provide an open and accessible, well-labeled, sufficiently curated image dataset. This is to enable researchers to build various machine...
  • Cassava root cross-section images

    This dataset contains images of cassava root cross-sections captured by the Makerere University Artificial Intelligence Lab in conjunction with the National Crop Resources...
  • Lacuna Malaria Datasets

    This dataset contains thick and thin blood smear images captured using smartphones on a microscope. The images have been annotated with bounding boxes showing different objects...
  • Sentiment Tagged Parallel Corpus for Luganda and Swahili

    This dataset contains 10,000 parallel sentiment-tagged sentences. English sentences were translated to both Luganda and Swahili. The translations were done by language experts...
  • The Makerere Gendered Corpus: A Gendered English to Luganda Parallel Corpus

    This English-Luganda parallel sentence corpus consists of gendered examples created by a team of researchers from Makerere AI Lab at Makerere University with a team of Luganda...
  • Kiswahili Monolingual Corpus

    This dataset contains 100,000 Kiswahili sentences. We want to thank the team at the Makerere AI and Marconi Labs at Makerere University, TAVODET Youth Development (TYD)...
  • Lumasaba Monolingual Corpus

    Lumasaba sometimes known as Lugisu is a Bantu language spoken in the Eastern part of Uganda. This dataset contains a total of 39,999 sentences. The sentences are split into two...
  • Luganda Monolingual Corpus

    This dataset contains 100,000 Luganda sentences. Luganda is a Bantu language and is one of the major languages spoken in Uganda. This dataset was compiled by researchers at the...
  • Acoli Monolingual Corpus

    Acoli is a very low-resourced language spoken in parts of Northern Uganda. This dataset contains 40,037 Acoli sentences. The sentences were collected and evaluated by Acoli...