
The recording process was conducted using Audacity software with a sampling rate of 44,100 Hz, 32-bit float type, and a mono channel. Each audio file was saved in WAV (Waveform Audio File Format). The recordings were collected in quiet environments and in a closed, soundproof studio to minimize noise interference during the recording process. The number of speakers from each ethnic group is as follows: Dialect of an ethnic group Male Female Javanese 2 3 Sundanese 5 3 Batak 2 2 Bali 4 5 Minang 3 2 Total 16 15 Audio File Naming Format For example, ID2JF06OR-0001 follows the format: The first three digits indicate the name of the dataset used. The next single digit denotes the ethnic group of the speaker for the uttered sentence, consisting of: L: Balinese T: Batak J: Javanese M: Minang S: Sundanese The subsequent three digits represent the initials of the recorded individual, where F denotes Female and M denotes Male. ‘OR’ stands for Original, indicating that the data is not augmented.
ID2 (Indonesian Dataset 2) is an Indonesian speech dataset that features dialectal variations recorded from 31 speakers belonging to various ethnic groups in Indonesia, namely Javanese, Sundanese, Batak, Balinese, and Minang. The speakers comprise both male and female individuals aged between 17 and 25 years. This dataset includes 330 sentences from diverse domains, accompanied by manually created transcriptions. The dataset has a total of 10,230 sentences, spanning 7 hours, 40 minutes, and 48 seconds.
Indonesian Speech, Minang dialect, Javanese dialect, Dialectal in Indonesian, Sundanese dialect, Batak dialect, Balinese dialect
Indonesian Speech, Minang dialect, Javanese dialect, Dialectal in Indonesian, Sundanese dialect, Batak dialect, Balinese dialect
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
