Title: | A Lightweight, Flexible, and Fast Data Validation Package that Can Handle All Sizes of Data |
---|---|
Description: | Allows you to define rules which can be used to verify a given dataset. The package acts as a thin wrapper around more powerful data packages such as 'dplyr', 'data.table', 'arrow', and 'DBI' ('SQL'), which do the heavy lifting. |
Authors: | David Zimmermann-Kollenda [aut, cre], Beniamino Green [ctb] |
Maintainer: | David Zimmermann-Kollenda <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.8 |
Built: | 2024-11-05 04:12:29 UTC |
Source: | https://github.com/davzim/dataverifyr |
Programatically Combine a List of Rules and Rulesets into a Single Ruleset
bind_rules(rule_ruleset_list)
bind_rules(rule_ruleset_list)
rule_ruleset_list |
a list of rules and rulesets you whish to combine into a single list |
a ruleset which consolidates all the inputs
Checks if a dataset confirms to a given set of rules
check_data( x, rules, xname = deparse(substitute(x)), stop_on_fail = FALSE, stop_on_warn = FALSE, stop_on_error = FALSE )
check_data( x, rules, xname = deparse(substitute(x)), stop_on_fail = FALSE, stop_on_warn = FALSE, stop_on_error = FALSE )
x |
a dataset, either a |
rules |
a list of |
xname |
optional, a name for the x variable (only used for errors) |
stop_on_fail |
when any of the rules fail, throw an error with stop |
stop_on_warn |
when a warning is found in the code execution, throw an error with stop |
stop_on_error |
when an error is found in the code execution, throw an error with stop |
a data.frame-like object with one row for each rule and its results
rs <- ruleset( rule(mpg > 10), rule(cyl %in% c(4, 6)), # missing 8 rule(qsec >= 14.5 & qsec <= 22.9) ) rs check_data(mtcars, rs)
rs <- ruleset( rule(mpg > 10), rule(cyl %in% c(4, 6)), # missing 8 rule(qsec >= 14.5 & qsec <= 22.9) ) rs check_data(mtcars, rs)
allows you to add rules and rulesets into larger rulesets. This can be useful if you want to create a ruleset for a dataset out of checks for other datasets.
datavarifyr_plus(a, b) ## S3 method for class 'ruleset' a + b ## S3 method for class 'rule' a + b
datavarifyr_plus(a, b) ## S3 method for class 'ruleset' a + b ## S3 method for class 'rule' a + b
a |
the first ruleset you wish to add |
b |
the second ruleset you wish to add |
The detection will be made based on the class of the object as well as the packages installed.
For example, if a data.frame
is used, it will look if data.table
or dplyr
are installed on the system, as they provide more speed.
Note the main functions will revert the
detect_backend(x)
detect_backend(x)
x |
The data object, ie a data.frame, tibble, data.table, arrow, or DBI object |
a single character element with the name of the backend to use.
One of base-r
, data.table
, dplyr
, collectibles
(for arrow or DBI objects)
data <- mtcars detect_backend(data)
data <- mtcars detect_backend(data)
Filters a result dataset for the values that failed the verification
filter_fails(res, x, per_rule = FALSE)
filter_fails(res, x, per_rule = FALSE)
res |
a result data.frame as outputted from |
x |
a dataset that was used in |
per_rule |
if set to TRUE, a list of filtered data is returned, one for each failed verification rule. If set to FALSE, a data.frame is returned of the values that fail any rule. |
the dataset with the entries that did not match the given rules
rules <- ruleset( rule(mpg > 10 & mpg < 30), # mpg goes up to 34 rule(cyl %in% c(4, 8)), # missing 6 cyl rule(vs %in% c(0, 1), allow_na = TRUE) ) res <- check_data(mtcars, rules) filter_fails(res, mtcars) filter_fails(res, mtcars, per_rule = TRUE) # alternatively, the first argument can also be a ruleset filter_fails(rules, mtcars) filter_fails(rules, mtcars, per_rule = TRUE)
rules <- ruleset( rule(mpg > 10 & mpg < 30), # mpg goes up to 34 rule(cyl %in% c(4, 8)), # missing 6 cyl rule(vs %in% c(0, 1), allow_na = TRUE) ) res <- check_data(mtcars, rules) filter_fails(res, mtcars) filter_fails(res, mtcars, per_rule = TRUE) # alternatively, the first argument can also be a ruleset filter_fails(rules, mtcars) filter_fails(rules, mtcars, per_rule = TRUE)
Visualize the results of a data validation
plot_res( res, main = "Verification Results per Rule", colors = c(pass = "#308344", fail = "#E66820"), labels = TRUE, table = TRUE )
plot_res( res, main = "Verification Results per Rule", colors = c(pass = "#308344", fail = "#E66820"), labels = TRUE, table = TRUE )
res |
a data.frame as returned by |
main |
the title of the plot |
colors |
a named list of colors, with the names pass and fail |
labels |
whether the values should be displayed on the barplot |
table |
show a table in the legend with the values |
a base r plot
rs <- ruleset( rule(Ozone > 0 & Ozone < 120, allow_na = TRUE), # some mising values and > 120 rule(Solar.R > 0, allow_na = TRUE), rule(Solar.R < 200, allow_na = TRUE), rule(Wind > 10), rule(Temp < 100) ) res <- check_data(airquality, rs) plot_res(res)
rs <- ruleset( rule(Ozone > 0 & Ozone < 120, allow_na = TRUE), # some mising values and > 120 rule(Solar.R > 0, allow_na = TRUE), rule(Solar.R < 200, allow_na = TRUE), rule(Wind > 10), rule(Temp < 100) ) res <- check_data(airquality, rs) plot_res(res)
Creates a single data rule
rule(expr, name = NA, allow_na = FALSE, negate = FALSE, ...) ## S3 method for class 'rule' print(x, ...)
rule(expr, name = NA, allow_na = FALSE, negate = FALSE, ...) ## S3 method for class 'rule' print(x, ...)
expr |
an expression which dictates which determines when a rule is good.
Note that the expression is evaluated in |
name |
an optional name for the rule for reference |
allow_na |
does the rule allow for NA values in the data? default value is FALSE.
Note that when NAs are introduced in the expression, |
negate |
is the rule negated, only applies to the expression not allow_na,
that is, if |
... |
additional arguments that are carried along for your documentation, but are not used. Could be for example date, person, contact, comment, etc |
x |
a rule to print |
The rule values as a list
print(rule)
: Prints a rule
r <- rule(mpg > 10) r r2 <- rule(mpg > 10, name = "check that mpg is reasonable", allow_na = TRUE, negate = FALSE, author = "me", date = Sys.Date()) r2 check_data(mtcars, r) rs <- ruleset( rule(mpg > 10), rule(cyl %in% c(4, 6)), # missing 8 rule(qsec >= 14.5 & qsec <= 22.9) ) rs check_data(mtcars, rs)
r <- rule(mpg > 10) r r2 <- rule(mpg > 10, name = "check that mpg is reasonable", allow_na = TRUE, negate = FALSE, author = "me", date = Sys.Date()) r2 check_data(mtcars, r) rs <- ruleset( rule(mpg > 10), rule(cyl %in% c(4, 6)), # missing 8 rule(qsec >= 14.5 & qsec <= 22.9) ) rs check_data(mtcars, rs)
Creates a set of rules
ruleset(...) ## S3 method for class 'ruleset' print(x, n = 3, ...)
ruleset(...) ## S3 method for class 'ruleset' print(x, n = 3, ...)
... |
a list of rules |
x |
a ruleset to print |
n |
a maximum number of rules to print |
the list of rules as a ruleset
print(ruleset)
: Prints a ruleset
r1 <- rule(mpg > 10) r2 <- rule(mpg < 20) rs <- ruleset(r1, r2) rs rs <- ruleset( rule(cyl %in% c(4, 6, 8)), rule(is.numeric(disp)) ) rs
r1 <- rule(mpg > 10) r2 <- rule(mpg < 20) rs <- ruleset(r1, r2) rs rs <- ruleset( rule(cyl %in% c(4, 6, 8)), rule(is.numeric(disp)) ) rs
Read and write rules to a yaml file
write_rules(x, file) read_rules(file)
write_rules(x, file) read_rules(file)
x |
a list of rules |
file |
a filename |
the filename invisibly
read_rules()
: reads a ruleset back in
rr <- ruleset( rule(mpg > 10), rule(cyl %in% c(4, 6, 8)) ) file <- tempfile(fileext = ".yml") write_rules(rr, file)
rr <- ruleset( rule(mpg > 10), rule(cyl %in% c(4, 6, 8)) ) file <- tempfile(fileext = ".yml") write_rules(rr, file)