Task graphs for benchmarking schedulers

Workflow Task Graph Dataset This dataset contains three sets of task graphs representing different types of task workflows: Elementary - contains trivial graph shapes, such as tasks with no dependencies or simple fork-join graphs. This set should test how the scheduler heuristics react to basic graph scenarios that frequently form parts of larger workflows. IRW - is inspired by real-world workflows, such as machine learning cross-validation or map-reduce. Pegasus - is derived from graphs created by Pegasus Synthetic Workflow Generators (https://github.com/pegasus-isi/WorkflowGenerator) All of the provided task graphs are generated and compatible with ESTEE (https://github.com/It4innovations/estee) that allows to simulate their execution on a distributed system using various scheduling heuristics and environment conditions. Data Format Task graphs are stored in {elementary, irw, pegasus}.zip files that contain JSON representation of respective task graphs with the following fields: `graph_name` - Task graph name `graph_id` - Unique task graph identifier `graph` - Task graph representation - list of tasks where each task is represented as a dictionary with the following keys: `d`: Actual task duration in seconds (float value) `e_d`: User estimated task duration in seconds (float value) `cpus`: Task CPU core requirements (integer value) `outputs`: List of task outputs (list of integers indicating sizes of task outputs in MiB) `inputs`: List of task inputs in format of list [task\_id, output\_index]}. Output index is zero-based. For example this task graph: [{'d': 200, 'e_d': 180, 'cpus': 1, 'outputs': [100], 'inputs': []}, {'d': 50, 'e_d': 60, 'cpus': 2, 'outputs': [], 'inputs': [[0, 0]]}] contains two tasks. One requiring no input, single CPU core with estimated duration 180s, actual duration 200s and producing a single output of 100 MiB. And another one requiring as an input task0's 0-th output, requiring 2 CPU cores, producing no output with estimated duration 60s and actual duration 50s. Parsing the data In Python, to load the elementary task graph set run the following snippet: import pandas as pd graphs = pd.read_json("./elementary.zip") If you have Estee installed, you can use its provided `json_deserialize` function to parse the JSON encoded graphs into Estee TaskGraph data structure. from estee.serialization.dask_json import json_deserialize graph_json = graphs.loc[0, "graph"] graph = json_deserialize(graph)

Keywords

task graph, benchmark, workflow, scheduling

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	22
download	downloads	4

22
views
4
downloads
Powered by

Found an issue? Give us feedback

visibility

download

1

Average

22

4