Creating A Dataset
This walkthrough builds a small process graph from F# objects. The goal is to show the model shape rather than every field in the specification.
Dataset
Start with a dataset. Administrative metadata is optional, but an identifier is the stable handle for the dataset.
let dataset = Dataset("demo-dataset")
dataset.Name <- Some "Minimal ProcessCore example"
dataset.Description <- Some "One extraction process with nested quality control."
LabProtocol
A protocol describes the method. Formal parameters define expected knobs, for which values should be provided when the protocol is executed.
let protocol = LabProtocol()
let temperature = FormalParameter("temperature")
protocol.Name <- Some "Extraction"
protocol.IntendedUse <- Some (DefinedTerm("sample extraction"))
protocol.AddParameter(temperature)
Components are non-transformed entities in a protocol, such as machines or reagents.
let centrifuge = PropertyValue(name = "centrifuge", value = "Eppendorf 5420")
let buffer = PropertyValue(name = "buffer", value = "PBS")
protocol.AddLabEquipment(centrifuge)
protocol.AddLabEquipment(buffer)
LabProcess
LabProcesses are the core of the process graph. They are concrete executions of a protocol, with specific parameter values, and input and output entities.
First, we define input and output, i.e. Material or Data nodes.
let leaf = Material("Leaf tissue")
let extractData = Data("raw/extract.csv")
extractData.EncodingFormat <- Some "text/csv"
A LabProcess connects those inputs to outputs. We also attach parameter values to the process, which should correspond to the protocol's formal parameters.
let extraction = LabProcess("Extraction")
let degrees25 = PropertyValue(name = "temperature", value = "25", unit = "degree Celsius", instanceOf = temperature)
extraction.ExecutesProtocol <- Some protocol
extraction.AddInputMaterial(leaf)
extraction.AddOutputData(extractData)
extraction.AddParameterValue(degrees25)
dataset.AddProcess(extraction)
Nested Datasets
Datasets can contain child datasets. When a child dataset is added, its process nodes are re-canonicalized against the root dataset.
let child = Dataset("qc-dataset")
child.Name <- Some "Quality control"
let qcReport = Data("qc/extract-report.tsv")
let qc = LabProcess("Quality Control")
qc.AddInputData(extractData)
qc.AddOutputData(qcReport)
let threshold = FormalParameter("threshold")
let threshold95 = PropertyValue(name = "threshold", value = "0.95", instanceOf = threshold)
qc.AddParameterValue(threshold95)
child.AddProcess(qc)
dataset.AddPart(child)
The parent process output and the child process input are the same logical Data node: same path, no selector. After AddPart, they are also the same object instance in the root dataset.
let qcInputAfterAttach =
match qc.Inputs.[0] with
| DataNode d -> d
| MaterialNode _ -> failwith "Expected data input"
let sharedDataIdentity =
obj.ReferenceEquals(extractData, qcInputAfterAttach)
sharedDataIdentity
|
The graph is now queryable from either the dataset or any node.
let finalNodes =
dataset.FinalNodes()
|> Seq.map (fun n -> n.Key())
|> Seq.toList
finalNodes
|
What To Use When
Task |
API |
|---|---|
Create a container |
|
Add a process |
|
Add nested datasets |
|
Connect materials or files |
|
Attach process parameters |
|
Attach characteristics/factors |
|
Attach protocol components |
|
val string: value: 'T -> string
--------------------
type string = System.String
System.String.Trim( trimChars: char array) : string
System.String.Trim(trimChar: char) : string
<summary>Provides methods for encoding and decoding URLs when processing Web requests.</summary>
System.Net.WebUtility.HtmlEncode(value: string, output: System.IO.TextWriter) : unit
type Dataset = inherit DynamicObj new: identifier: string * ?name: string * ?description: string * ?additionalType: string * ?processes: LabProcess seq * ?hasPart: Dataset seq * ?additionalProperty: PropertyValue seq -> Dataset member AddAdditionalProperty: pv: PropertyValue -> unit member AddPart: child: Dataset -> unit member AddProcess: proc: LabProcess -> unit member AllConnectedNodes: node: IONode -> ResizeArray<IONode> member AllData: unit -> ResizeArray<Data> member AllMaterials: unit -> ResizeArray<Material> member AllNodes: unit -> ResizeArray<IONode> member AllProcesses: unit -> ResizeArray<LabProcess> ...
<summary> Container and context for data and processes. schema.org/Dataset </summary>
--------------------
new: identifier: string * ?name: string * ?description: string * ?additionalType: string * ?processes: LabProcess seq * ?hasPart: Dataset seq * ?additionalProperty: PropertyValue seq -> Dataset
type LabProtocol = inherit DynamicObj new: ?name: string * ?description: string * ?version: string * ?url: string * ?intendedUse: DefinedTerm * ?additionalType: string * ?parameters: FormalParameter seq * ?labEquipment: PropertyValue seq * ?additionalProperty: PropertyValue seq -> LabProtocol member AddAdditionalProperty: pv: PropertyValue -> unit member AddLabEquipment: pv: PropertyValue -> unit member AddParameter: fp: FormalParameter -> unit override Equals: obj: obj -> bool override GetHashCode: unit -> int member RemoveAdditionalProperty: pv: PropertyValue -> unit member RemoveLabEquipment: pv: PropertyValue -> unit member RemoveParameter: fp: FormalParameter -> unit ...
<summary> Description of a planned procedure. bioschemas.org/LabProtocol </summary>
--------------------
new: ?name: string * ?description: string * ?version: string * ?url: string * ?intendedUse: DefinedTerm * ?additionalType: string * ?parameters: FormalParameter seq * ?labEquipment: PropertyValue seq * ?additionalProperty: PropertyValue seq -> LabProtocol
type FormalParameter = inherit DynamicObj new: name: string * ?nameTAN: string * ?defaultValue: DefinedTerm -> FormalParameter override Equals: obj: obj -> bool override GetHashCode: unit -> int member DefaultValue: DefinedTerm option with get, set member Name: string with get, set member NameTAN: string option with get, set
<summary> Describes the shape and type of a protocol parameter slot. bioschemas.org/FormalParameter </summary>
--------------------
new: name: string * ?nameTAN: string * ?defaultValue: DefinedTerm -> FormalParameter
type DefinedTerm = inherit DynamicObj new: name: string * ?tan: string * ?inDefinedTermSet: string -> DefinedTerm override Equals: obj: obj -> bool override GetHashCode: unit -> int member TermAccessionShort: unit -> string member InDefinedTermSet: string option with get, set member Name: string with get, set member TAN: string option with get, set
<summary> Ontology annotation referencing a term in a controlled vocabulary or ontology. schema.org/DefinedTerm </summary>
--------------------
new: name: string * ?tan: string * ?inDefinedTermSet: string -> DefinedTerm
type PropertyValue = inherit DynamicObj new: name: string * ?value: string * ?unit: string * ?nameTAN: string * ?valueTAN: string * ?unitTAN: string * ?additionalType: string * ?instanceOf: FormalParameter -> PropertyValue override Equals: obj: obj -> bool override GetHashCode: unit -> int member AdditionalType: string option with get, set member InstanceOf: FormalParameter option with get, set member Name: string with get, set member NameTAN: string option with get, set member NameText: string member Unit: string option with get, set ...
<summary> Extensible key-value-unit triple. Primary extension mechanism of ProcessCore. schema.org/PropertyValue </summary>
--------------------
new: name: string * ?value: string * ?unit: string * ?nameTAN: string * ?valueTAN: string * ?unitTAN: string * ?additionalType: string * ?instanceOf: FormalParameter -> PropertyValue
type Material = inherit DynamicObj new: name: string * ?additionalType: string * ?additionalProperty: PropertyValue seq -> Material member AddAdditionalProperty: pv: PropertyValue -> unit member AllConnectedNodes: ?scope: ResizeArray<LabProcess> -> ResizeArray<IONode> member AllConnectedProcesses: ?scope: ResizeArray<LabProcess> -> ResizeArray<LabProcess> member AllPropertyValues: ?scope: ResizeArray<LabProcess> -> ResizeArray<PropertyValue> member ConnectedData: ?scope: ResizeArray<LabProcess> -> ResizeArray<Data> member ConnectedMaterials: ?scope: ResizeArray<LabProcess> -> ResizeArray<Material> member DownstreamData: ?scope: ResizeArray<LabProcess> -> ResizeArray<Data> member DownstreamMaterials: ?scope: ResizeArray<LabProcess> -> ResizeArray<Material> ...
<summary> Input or output biological, chemical, or digital material in the process graph. bioschemas.org/Sample </summary>
--------------------
new: name: string * ?additionalType: string * ?additionalProperty: PropertyValue seq -> Material
namespace Microsoft.FSharp.Data
--------------------
type Data = inherit DynamicObj new: path: string * ?selector: string * ?selectorFormat: string * ?encodingFormat: string * ?additionalType: string * ?additionalProperty: PropertyValue seq -> Data member AddAdditionalProperty: pv: PropertyValue -> unit member AllConnectedNodes: ?scope: ResizeArray<LabProcess> -> ResizeArray<IONode> member AllConnectedProcesses: ?scope: ResizeArray<LabProcess> -> ResizeArray<LabProcess> member AllPropertyValues: ?scope: ResizeArray<LabProcess> -> ResizeArray<PropertyValue> member ConnectedData: ?scope: ResizeArray<LabProcess> -> ResizeArray<Data> member ConnectedMaterials: ?scope: ResizeArray<LabProcess> -> ResizeArray<Material> member DownstreamData: ?scope: ResizeArray<LabProcess> -> ResizeArray<Data> member DownstreamMaterials: ?scope: ResizeArray<LabProcess> -> ResizeArray<Material> ...
<summary> Data file produced or consumed by processes. schema.org/MediaObject or File </summary>
--------------------
new: path: string * ?selector: string * ?selectorFormat: string * ?encodingFormat: string * ?additionalType: string * ?additionalProperty: PropertyValue seq -> Data
<summary> MIME type </summary>
type LabProcess = inherit DynamicObj new: name: string * ?executesProtocol: LabProtocol * ?additionalType: string * ?inputs: IONode seq * ?outputs: IONode seq * ?parameterValue: PropertyValue seq -> LabProcess member AddInput: node: IONode -> unit member AddInputData: d: Data -> unit member AddInputMaterial: m: Material -> unit member AddOutput: node: IONode -> unit member AddOutputData: d: Data -> unit member AddOutputMaterial: m: Material -> unit member AddParameterValue: pv: PropertyValue -> unit member CanonicalizeAllNodes: ds: Dataset -> unit ...
<summary> Core transformation node. Connects inputs to outputs via a protocol. bioschemas.org/LabProcess </summary>
--------------------
new: name: string * ?executesProtocol: LabProtocol * ?additionalType: string * ?inputs: IONode seq * ?outputs: IONode seq * ?parameterValue: PropertyValue seq -> LabProcess
ProcessCore