Dataset and Code Repository for Continuous Ethiopian Sign Language

Sample images from the Continuous Ethiopian Sign Language (CESL) Dataset

CESL Dataset Overview

The Continuous Ethiopian Sign Language (CESL) dataset was created to support EthSL recognition due to the lack of publicly available datasets in this domain. The dataset contains 1,320 videos from 22 signers (14 men and 8 women) aged 10-60, covering various signing speeds and styles. The dataset captures 30 sentences selected from beginner sign language books, focusing on family, occupations, and daily life.

Data collection was conducted at Fasilo Secondary School and Yekatit 23 Primary School in Bahir Dar, Ethiopia. The recordings feature a green background with signers wearing black to improve clarity, recorded using a Canon Mark IV camera in full HD (1920x1080 pixels) at 25 frames per second.

Dataset Statistics

Resolution1920x1080 pixels
Total Videos1,320
Vocabulary Size65 words
Sentence Count30
Duration2-15 seconds

Suggested Splits

Signer-Independent Split: Tests the model on unseen signers. Data from 16 signers is used for training, while 6 are divided into validation and testing sets.

Unseen Sentence Split: Separates the data into training and testing based on sentence content. 25 sentences are used for training, and 5 are reserved for testing, allowing the model to generalize to new content.

Download Dataset

Zenodo Link

Download Code and Try

Github Link