Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Importer #11

Open
mlr0p opened this issue Nov 11, 2020 · 2 comments
Open

Generic Importer #11

mlr0p opened this issue Nov 11, 2020 · 2 comments
Assignees

Comments

@mlr0p
Copy link
Contributor

mlr0p commented Nov 11, 2020

It would be beneficial to have a generic importer (or at least approximates to generic ) that can parse and import arbitary dump in any format.

@SoftPoison
Copy link

Expanding on this, the reason this is necessary is there is a lot of code reuse between each importer, and as such it can be quite daunting to write a new importer/fix issues. Having some sort of generic importer/importing method should both reduce code reuse and lower the barrier to entry for newcomers.

Golang has some form of interfaces/inheritance (https://golang.org/doc/effective_go.html#embedding), so we could use that for structuring an importer.
In terms of how this should work on a technical level, here's what I propose:

importers/util/common.go

This file should be something of the form:

package util

type LineParser interface {
    ParseLine(line string) ([]interface{}, err)
    EstimateCount(line string) (int64, err)
}

type Importer struct {
    parser LineParser
    bar *pb.ProgressBar
    numThreads int
    threader chan string
    doner chan bool
    mongo *mgo.Session
    verbose bool
    fileName string
    // ... other variables that it needs
}

func MakeImporter(parser LineParser, verbose bool /*, other variables... */) *Importer {
    // basically just do what the main funcs of the current importers do, but in here
    // should just initialise everything, but not run the main loop (yet)
    // should set up the progress bar too (if verbose enabled)
    // creating the progress bar should call parser.EstimateCount(line) to get an estimate of how many creds on that line
}

func (i *Importer) Run() {
    // this part should have the threader <- r.ReadLine() loop and the <- doner loop
}

func (i *Importer) importLine() {
    // basically just copy paste the current importLine() functionality, but call i.parser.ParseLine(line) instead for the parsing (and handle any errors)
}

importers/importer-(sql-)template.go

These two files should show how easy the new system will be. They should be somewhat of the form:

import (
    "github.com/zxsecurity/steamer/importers/util"
)

type GenericData struct {
	Id           bson.ObjectId `json:"id" bson:"_id,omitempty"`
	MemberID     int           `bson:"memberid"`
	Email        string        `bson:"email"`
	Liame        string        `bson:"liame"`
	PasswordHash string        `bson:"passwordhash"`
	Password     string        `bson:"password"`
	Breach       string        `bson:"breach"`
}

type TemplateLineParser struct {}
func (t TemplateLineParser) ParseLine(line string) ([]interface{}, err) {
    data := make([]GenericData, 0)
    // code to parse a line into its data blobs

    return data
}

func (t TemplateLineParser) EstimateCount(line string) (int64, err) {
    // code to estimate how many pieces of data are in a line (for the progress bar)
}

func main() {
    parser := TemplateLineParser{}
    
    // other setup code ...

    importer := util.MakeImporter(parser /* , other args ... */)
    importer.Run()
}

@SoftPoison
Copy link

SoftPoison commented Nov 12, 2020

The plan to close this issue:

  1. Implement modular parsing to reduce code reuse
  2. Move the old code to the new system
  3. Using the modular parsing, implement a generic parser that should work for most things

One pull request should be made for 1. and 2. and a separate one for 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants