actionETL Features

actionETL is a high performance, highly productive .NET library and NuGet package for easily writing ETL data processing applications in .NET languages such as C# and VB. You can reference and call the library from your application, or use it to create and run an executable, e.g. a console program. It is suitable both for ETL developers with limited .NET programming experience, and for full-time .NET developers that have ETL requirements.

Small applications such as this example are obviously easy to create in most tools:

Worker hierarchy

Here is the corresponding actionETL source code, which moves a file, executes an external command, reads a CSV file and inserts the records in a database. Database workers such as AdbInsertTarget works with all actionETL database providers (SQL Server, MySQL, ODBC...), which simplifies writing database agnostic code:

new WorkerSystem()
.Root(root =>
    var mfw = new MoveFileWorker(root, "Move CSV", SourceFileName, ProcessFileName);

    new ActionWorker(root, "Group", () => mfw.IsSucceeded, aw =>
        new ExecuteProcessWorker(aw, "Update Timestamps", ExecutableFileName);

        new FileHelperFileSource<category>(aw, "Read CSV", ProcessFileName)

            , provider.CreateConnectionBuilder(root.Config["SqlServer"])
            , "dbo.Category");

Excels with large and complex requirements

Crucially, actionETL also makes it easy to create and maintain high performance ETL applications that handle truly large and complex requirements.

See for instance the Process Incoming Files Example, which creates a custom worker to provide standardized and reliable processing, archiving, and logging of incoming files, for any file type.

High Performance

  • High performancestatically typed, and easily reusable dataflows for reading and writing data rows to and from databases, files etc., handling arbitrarily complex logic
  • Combine in-memory, database, and file based processing to get the best of each approach

  • Parallel scheduler and low overhead, supporting millions of simultaneous or short duration workers, also suitable for micro batches
  • Dataflow row and column high performance copyingcomparing, and mapping facilities for use when creating custom workers

Effective Database Programming

  • Wraps and extends .NET database providers to simplify writing database agnostic workers and ETL code
  • Dedicated MariaDB™, MySQL™, PostgreSQL®, SQLite, and SQL Server® providers
  • ODBC provider for other databases
  • Easily create reusable and mixed database and non-database logic
  • Supports local transactions
  • Supports many database specific data types (SqlDateTime, SqlGuid, …) end to end, avoiding conversion issues, increasing performance and reducing coding

Familiar ETL concepts

  • Orchestrate work using start constraints in a hierarchical structure, and create highly parallel applications
  • Manipulate, read and write delimitedfixed format, and XLSX files, execute database commands etc.
  • Cleansecombine and transform data
  • Mix and match workers, via a single base type providing dataflow‘control flow’, and grouping functionality, bringing very high flexibility, and minimizing staging and code size
  • Highly capable (as well as replaceable) logging and configuration systems

  • Automatically track and aggregate dataflow error row counts, and log error row contents
  • Distinct recoverable vs. unrecoverable failures, minimizing start constraint bugs
  • Effective debugging, with breakpoints on worker and port state changes, and inspecting and editing rows in flight
  • Many common tasks requires very little programming, and are more akin to configuration
  • Familiar concepts and a concise API makes it easy to learn

Modern Application Development

  • Compose and encapsulate existing workers into new (‘control flow’ and dataflow) reusable workers
  • Reuse and combine column schemas (i.e. groups of columns) to minimize dataflow bugs, maintenance effort, and code size
  • Many highly flexible workers that can optionally accept code snippets (lambdas), e.g. to configure the dataflow join to perform a greater-than join (i.e. not the default equi-join)
  • Perform ETL testing with any .NET test framework

  • NuGet package provides simple inclusion in projects
  • Single development model for both using and extending the library, simplifying creating reusable custom functionality, while retaining full performance
  • Offers huge advantages over traditional ETL tools in terms of reusingcomposingencapsulatingtestingrefactoring, and maintaining your ETL code, by adopting and promoting modern application development tools and techniques

actionETL runs on Windows .NET Framework 4.6.1+, and is also being ported to .NET Standard / .NET Core to run cross-platform, followed by .NET 5. With .NET Standard in place, we'll also start adding cloud specific features.