Skip to content

Commit

Permalink
feat: implement new API that allows invoking YARA modules directly (#52)
Browse files Browse the repository at this point in the history
With this new API you use a YARA module as a file-parsing tool, and obtain the results produced by the module without any YARA rule involved in the process. This is useful for external tools that what to leverage YARA's file parsing capabilities for their own purposes.

This required changing the arguments passed to the main function of each module, from &ScanContext to &[u8].
  • Loading branch information
plusvic authored Nov 10, 2023
1 parent e6c9f40 commit 8c0a57b
Show file tree
Hide file tree
Showing 17 changed files with 116 additions and 47 deletions.
19 changes: 9 additions & 10 deletions docs/Module Developer's Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,9 +244,8 @@ use crate::modules::prelude::*;
use crate::modules::protos::text::*;

#[module_main]
fn main(ctx: &ScanContext) -> Text {
fn main(data: &[u8]) -> Text {
let mut text_proto = Text::new();
let data = ctx.scanned_data();

// TODO: parse the data and populate text_proto.

Expand Down Expand Up @@ -284,18 +283,18 @@ Next comes the module's main function:

```rust
#[module_main]
fn main(ctx: &ScanContext) -> Text {
fn main(data: &[u8]) -> Text {
...
}
```

The module's main function is called for every file scanned by YARA. This
function receives a reference to a `ScanContext` structure that gives you access
to the scanned data. It must return the `Text` structure that was generated from
the `text.proto` file. The main function must have the `#[module_main]` attribute.
Notice that the module's main function doesn't need to be called `main`, it can
have any arbitrary name, as long as it has the `#[module_main]` attribute. Of
course, this attribute can't be used with more than one function per module.
function receives a byte slice with the content of the file being scanned. It
must return the `Text` structure that was generated from the `text.proto` file.
The main function must have the `#[module_main]` attribute. Notice that the
module's main function doesn't need to be called `main`, it can have any
arbitrary name, as long as it has the `#[module_main]` attribute. Of course,
this attribute can't be used with more than one function per module.

The main function usually consists in creating an instance of the protobuf
you previously defined, and populating the protobuf with information extracted from
Expand All @@ -310,7 +309,7 @@ use std::io;
use std::io::BufRead;

#[module_main]
fn main(ctx: &ScanContext) -> Text {
fn main(data: &[u8]) -> Text {
// Create an empty instance of the Text protobuf.
let mut text_proto = Text::new();

Expand Down
4 changes: 2 additions & 2 deletions yara-x-macros/src/module_main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ pub(crate) fn impl_module_main_macro(

let main_stub = quote! {
use protobuf::MessageDyn;
pub(crate) fn __main__(ctx: &ScanContext) -> Box<dyn MessageDyn> {
Box::new(#fn_name(ctx))
pub(crate) fn __main__(data: &[u8]) -> Box<dyn MessageDyn> {
Box::new(#fn_name(data))
}
};

Expand Down
2 changes: 1 addition & 1 deletion yara-x/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ fn main() {
modules_rs,
r#"
#[cfg(feature = "{name}-module")]
pub mod {rust_mod};"#,
mod {rust_mod};"#,
)
.unwrap();
}
Expand Down
3 changes: 2 additions & 1 deletion yara-x/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ pub use compiler::compile;
pub use compiler::CompileError;
pub use compiler::CompileErrorInfo;
pub use compiler::Compiler;
pub use compiler::EmitWasmError;
pub use compiler::Error;
pub use compiler::Rules;
pub use compiler::SerializationError;
Expand All @@ -62,6 +61,8 @@ pub use scanner::ScanError;
pub use scanner::ScanResults;
pub use scanner::Scanner;

pub use modules::mods;

pub use variables::Variable;
pub use variables::VariableError;

Expand Down
6 changes: 2 additions & 4 deletions yara-x/src/modules/elf/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,7 @@ and sections information, exported symbols, target platform, etc.

use itertools::Itertools;
use lazy_static::lazy_static;
use md5;
use rustc_hash::FxHashSet;
use tlsh;

use crate::modules::prelude::*;
use crate::modules::protos::elf::*;
Expand All @@ -19,8 +17,8 @@ pub mod parser;
mod tests;

#[module_main]
fn main(ctx: &ScanContext) -> ELF {
match parser::ElfParser::new().parse(ctx.scanned_data()) {
fn main(data: &[u8]) -> ELF {
match parser::ElfParser::new().parse(data) {
Ok(elf) => elf,
Err(_) => ELF::new(),
}
Expand Down
2 changes: 1 addition & 1 deletion yara-x/src/modules/hash/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ thread_local!(
);

#[module_main]
fn main(_ctx: &ScanContext) -> Hash {
fn main(_data: &[u8]) -> Hash {
// With every scanned file the cache must be cleared.
SHA256_CACHE.with(|cache| cache.borrow_mut().clear());
SHA1_CACHE.with(|cache| cache.borrow_mut().clear());
Expand Down
4 changes: 2 additions & 2 deletions yara-x/src/modules/lnk/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ use crate::modules::protos::lnk::*;
pub mod parser;

#[module_main]
fn main(ctx: &ScanContext) -> Lnk {
match parser::LnkParser::new().parse(ctx.scanned_data()) {
fn main(data: &[u8]) -> Lnk {
match parser::LnkParser::new().parse(data) {
Ok(lnk) => lnk,
Err(_) => {
let mut lnk = Lnk::new();
Expand Down
5 changes: 1 addition & 4 deletions yara-x/src/modules/macho/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2574,13 +2574,10 @@ fn ep_for_arch_subtype(
/// code isn’t interrupted by issues with individual files during bulk
/// processing.
#[module_main]
fn main(ctx: &ScanContext) -> Macho {
fn main(data: &[u8]) -> Macho {
// Create an empty instance of the Mach-O protobuf
let mut macho_proto = Macho::new();

// Get a &[u8] slice with the content of the file being scanned.
let data = ctx.scanned_data();

// If data is too short to be valid Mach-O file, return empty protobuf
if data.len() < VALID_MACHO_LENGTH {
#[cfg(feature = "logging")]
Expand Down
76 changes: 73 additions & 3 deletions yara-x/src/modules/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ use protobuf::reflect::MessageDescriptor;
use protobuf::MessageDyn;
use rustc_hash::FxHashMap;

use crate::scanner::ScanContext;

pub mod protos {
include!(concat!(env!("OUT_DIR"), "/protos/mod.rs"));
}
Expand All @@ -26,7 +24,7 @@ pub(crate) mod prelude {
include!("modules.rs");

/// Type of module's main function.
type MainFn = fn(&ScanContext) -> Box<dyn MessageDyn>;
type MainFn = fn(&[u8]) -> Box<dyn MessageDyn>;

/// Describes a YARA module.
pub(crate) struct Module {
Expand Down Expand Up @@ -120,3 +118,75 @@ lazy_static! {
modules
};
}

pub mod mods {
/*! Utility functions and structures for invoking YARA modules directly.
The utility functions [`invoke_mod`] and [`invoke_mod_dyn`] allow leveraging
YARA modules for parsing some file formats independently of any YARA rule.
With these functions you can pass arbitrary data to a YARA module and obtain
the same data structure that is accessible to YARA rules and which you use
in your rule conditions.
This allows external projects to benefit from YARA's file-parsing
capabilities for their own purposes.
*/

/// Data structure returned by the `elf` module.
pub use super::protos::elf::ELF;
/// Data structure returned by the `lnk` module.
pub use super::protos::lnk::Lnk;
/// Data structure returned by the `macho` module.
pub use super::protos::macho::Macho;

/// Invoke a YARA module with arbitrary data.
///
/// <br>
///
/// YARA modules typically parse specific file formats, returning structures
/// that contain information about the file. These structures are used in YARA
/// rules for expressing powerful and rich conditions. However, being able to
/// access this information outside of YARA rules can also be beneficial.
///
/// <br>
///
/// This function allows the direct invocation of a YARA module for parsing
/// arbitrary data. It returns the structure produced by the module, which
/// depends upon the invoked module. The result will be [`None`] if the
/// module does not exist, or if it doesn't produce any information for
/// the input data.
///
/// `T` must be one of the structure types returned by a YARA module, which
/// are defined [`crate::mods`].
///
/// # Example
/// ```rust
/// # use yara_x;
/// # let data = &[];
/// let elf_info = yara_x::mods::invoke_mod::<yara_x::mods::ELF>(data);
/// ```
pub fn invoke_mod<T: protobuf::MessageFull>(
data: &[u8],
) -> Option<Box<T>> {
let module_output = invoke_mod_dyn::<T>(data)?;
Some(<dyn protobuf::MessageDyn>::downcast_box(module_output).unwrap())
}

/// Invoke a YARA module with arbitrary data, but returns a dynamic
/// structure.
///
/// This function is similar to [`invoke_mod`] but its result is a dynamic-
/// dispatch version of the structure returned by the YARA module.
pub fn invoke_mod_dyn<T: protobuf::MessageFull>(
data: &[u8],
) -> Option<Box<dyn protobuf::MessageDyn>> {
let descriptor = T::descriptor();
let proto_name = descriptor.full_name();
let (_, module) =
super::BUILTIN_MODULES.iter().find(|(_, module)| {
module.root_struct_descriptor.full_name() == proto_name
})?;

Some(module.main_fn?(data))
}
}
18 changes: 9 additions & 9 deletions yara-x/src/modules/modules.rs
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
// File generated automatically by build.rs. Do not edit.
#[cfg(feature = "string-module")]
pub mod string;
mod string;
#[cfg(feature = "macho-module")]
pub mod macho;
mod macho;
#[cfg(feature = "elf-module")]
pub mod elf;
mod elf;
#[cfg(feature = "text-module")]
pub mod text;
mod text;
#[cfg(feature = "hash-module")]
pub mod hash;
mod hash;
#[cfg(feature = "test_proto2-module")]
pub mod test_proto2;
mod test_proto2;
#[cfg(feature = "lnk-module")]
pub mod lnk;
mod lnk;
#[cfg(feature = "time-module")]
pub mod time;
mod time;
#[cfg(feature = "test_proto3-module")]
pub mod test_proto3;
mod test_proto3;
2 changes: 1 addition & 1 deletion yara-x/src/modules/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use crate::modules::prelude::*;
use crate::modules::protos::string::*;

#[module_main]
fn main(_ctx: &ScanContext) -> String {
fn main(_data: &[u8]) -> String {
// Nothing to do, but we have to return our protobuf
String::new()
}
Expand Down
4 changes: 2 additions & 2 deletions yara-x/src/modules/test_proto2/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ fn to_int(ctx: &ScanContext, string: RuntimeString) -> Option<i64> {
}

#[module_main]
fn main(ctx: &ScanContext) -> TestProto2 {
fn main(data: &[u8]) -> TestProto2 {
let mut test = TestProto2::new();

test.set_int32_zero(0);
Expand Down Expand Up @@ -131,7 +131,7 @@ fn main(ctx: &ScanContext) -> TestProto2 {

test.set_bool_proto(true);

test.set_file_size(ctx.scanned_data().len() as u64);
test.set_file_size(data.len() as u64);

test
}
2 changes: 1 addition & 1 deletion yara-x/src/modules/test_proto3/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use crate::modules::prelude::*;
use crate::modules::protos::test_proto3::TestProto3;

#[module_main]
fn main(_ctx: &ScanContext) -> TestProto3 {
fn main(_data: &[u8]) -> TestProto3 {
let mut test = TestProto3::new();

test.int32_zero = 0;
Expand Down
7 changes: 7 additions & 0 deletions yara-x/src/modules/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@ use std::fs;
use std::io::Write;
use std::path::Path;

/// Utility function that reads a file in [`Intel HEX`][1] (ihex) format and
/// returns the binary data contained in it.
///
/// All test files in this repository are stored in ihex format in order to
/// avoid storing executable files (some of them malware) in binary form.
///
/// [1]: https://en.wikipedia.org/wiki/Intel_HEX
pub fn create_binary_from_ihex<P: AsRef<Path>>(
path: P,
) -> anyhow::Result<Vec<u8>> {
Expand Down
5 changes: 1 addition & 4 deletions yara-x/src/modules/text.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,10 @@ use lingua::{Language, LanguageDetectorBuilder};
/// This function must return an instance of the protobuf message indicated
/// in the `root_message` option in `text.proto`.
#[module_main]
fn main(ctx: &ScanContext) -> Text {
fn main(data: &[u8]) -> Text {
// Create an empty instance of the Text protobuf.
let mut text_proto = Text::new();

// Get a &[u8] slice with the content of the file being scanned.
let data = ctx.scanned_data();

let mut num_lines = 0;
let mut num_words = 0;

Expand Down
2 changes: 1 addition & 1 deletion yara-x/src/modules/time.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use crate::modules::protos::time::*;
use std::time::{SystemTime, UNIX_EPOCH};

#[module_main]
fn main(_ctx: &ScanContext) -> Time {
fn main(_data: &[u8]) -> Time {
// Nothing to do, but we have to return our protobuf
Time::new()
}
Expand Down
2 changes: 1 addition & 1 deletion yara-x/src/scanner/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ impl<'r> Scanner<'r> {
// the data is specified by the .proto file associated to the
// module.
let module_output = if let Some(main_fn) = module.main_fn {
main_fn(ctx)
main_fn(data.as_ref())
} else {
// Implement the case in which the module doesn't have a main
// function and the serialized data should be provided by the
Expand Down

0 comments on commit 8c0a57b

Please # to comment.