Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Misc Bugs & PR request offer #15

Open
alexhiggins732 opened this issue May 19, 2023 · 7 comments
Open

BUG: Misc Bugs & PR request offer #15

alexhiggins732 opened this issue May 19, 2023 · 7 comments
Assignees
Labels

Comments

@alexhiggins732
Copy link
Collaborator

alexhiggins732 commented May 19, 2023

Hi,

Thanks for the time and effort to put this together. The resources linked to are great.

After testing the application, I found a few bugs/issues that I would like to point out. Since I have addressed them in my fork I would like to know if you would like me to open a PR to address any/or tall of them.

Details:

Testing with the following n:
316300508827
316406230007
3218147
4295229443
536870911

1. NullReferenceException: Serialization.Load.cs, Line 32:

Sometimes a leading , is appended to the Json resulting in the deserialized List<Relation> having a null relation. I did not spend time tracking down the source, but instead put a guard in to .TrimStart(',') the json read from disk to prevent the exception.

2. Exception thrown loading existing configuration data. Fixed with overload option in CreateGnfs.cs, Line 13 to allow user to create a new configuration overwriting one already on disk.

I added an optional default parameter bool createNewData = false to allow overwriting existing data for a given n. Sometimes the factorization fails and new parameters need to be set. This could be because of user parameters or other runtime issues in the code. In the case of the latter sometimes an exception is thrown trying to load the configuration from disk. In any case, the parameter allows creating new data, instead of loading the existing, and overwrites the existing configuration if exists.

To support the new optional overload:

  • MainForm.cs
  • Line 222: buttonCreate is enabled so the user can choose to load existing data or create new data.
  • Line 333: Added parameter true when calling CreateGnfs when clicking btnCreate

3. IsRationalIrreducible exception throw in SquareFinder.cs, Line 88:**

This exception was thrown when !IsRationalIrreducible evaluated as true. I hit this exception twice and it crashes the form and decided to comment it out. Since IsRationalIrreducible is logged, I left the check in itself, but perhaps the entire block can be commented out. Additionally, even when the conditions hits the algorithm went on to factor the number. Not sure if that was supposed to happen. I also don't know the math so perhaps when IsRationalIrreducible == false maybe the poly can be used to directly factor n?

4. Unit Test Refactoring - I refactored unit tests to require running any given step has called the previous dependent steps and, that the step completed successfully. This was inspired by your existing check in step two. Otherwise, when you run a step, it could run forever waiting for the previous step to complete even if that step has not been executed.

5.Unit Test02 OutOfMemoryException

Although the windows form application is working, Unit Test02 is not completing on my system. It gets stuck after finding 38 relations and never finds any more even though it needs 70 to complete. The end result is, if you let it run long enough, an OutOfMemoryException is thrown. To be clear, it is not the GNFS code throwing the exception. Instead, it is because the logging in the test fills up the unit tests frameworks in memory buffer for the standard output from the test due to the repeated Console.WriteLine calls.

A fix for this would not be part of the PR. Instead, I think a better fix would to be to put a max iterations parameter in the core code. This, potentially default optional parameter, could then be used to either throw an exception, or return some result to indicate failure, to guard against the collect relations step running forever. For example, max iterations could limit total iterations allowed to run in total, or it could be used to specify how many iterations are allowed to execute with no relations being found. The later could be helpful in letting the user know if the supplied polynomial and other hyperparameters supply are inadequate to adequately collection relations, for example if they specified a too low of smoothness bound.


Thanks again. And let me know if you would like me to submit a PR to address any/all of these.

My fixes are here: https://github.com/alexhiggins732/GNFS-DotNet-POC/tree/feature/allow-overwriting-existing-configuration

Alexander Higgins.

@alexhiggins732 alexhiggins732 changed the title BUG: [Title for your bug] BUG: Misc Bugs & PR request offer May 20, 2023
@AdamWhiteHat
Copy link
Owner

Awesome! Thanks for taking an interest in my project.

Yes please; Ill start taking a look at your changes on your fork now. Submit a PR and I will merge them in.

Yeah, I think that's a good solution regarding the OutOfMemoryException. Its definitely possible to pick a polynomial that yields an exceptionally low number of relations and/or pick the Prime FactorBaseMax bounds too small for the size of the polynomial, such that the Rational & Algebraic Norms in the Relation class are simply too large to factor completely with the size of primes the factor base is restricted to.
For the unit testing specifically, you can specify a max run time for any test (NUnit.Framework.MaxTime Attribute) and/or a max test retry count (NUnit.Framework.Retry Attribute).

Still, it would be good to have the Core library limit itself or detect this condition.
As the search proceeds and the values of A and B increase in size, so too will the size of the Rational & Algebraic Norms. At some point, you get diminishing returns, and then no returns as the Norms grow too large to factor with the chosen Factor Base size. So, it should be possible to choose a reasonable bound for the Relation.B parameter given the Factor Base sizes and/or polynomial size, or detect the condition that no more factorized relations are forthcoming. Ill look into this, think it over and find a reasonable approach.

The more intelligent and forgiving we can make the parameter selection process, the better. Any kind of guardrails that can be added to help guide the user down a path of success, whether by dynamically growing values as the search completes a block or by putting a stop-gap when it becomes clear the user is conducting a fruitless search, the better. Choosing the sweet spot for reasonable parameters is the toughest part to get right, the toughest part for a beginner to reason about and understand is an obstetrical that anyone hoping to use the application has to immediately contend with.
Hmm, maybe ill use GitHub's wiki feature to elucidate all of the parameter values, how they interact with each other, how they contribute to the magnitude of the numbers that are going to have to be successfully factorized to proceed on to the later matrix and square root steps. Just demystifying that first step of finding relations would be huge and really serve to bridge that gap and lower the barrier to entry.

Some more technical remarks on this process:
For a relation's rational or algebraic norm, if two of its prime factors exceed the RationalFactorBaseMax or the AlgebraicFactorBaseMax, respectively, well its going to fail to factor that norm. If the norm is partially factored, and the remaining prime factor exceeds the Rational/AlgebraicFactorBaseMax, it could still be able to recognize this condition by performing a probabilistic primality test on the remaining value. However I see the code that does this is currently commented out (\RelationSieve\Relation.cs, line #92-120). This would require then that all of the prime factors of the norms are less than the respective Factor Base to become a factored ("smooth") relation.

@alexhiggins732
Copy link
Collaborator Author

Great! I look forward to collaborating with you. I will work on those suggestions before I submit a PR.

In the meantime, I wrote a new library for serializing .poly files as used by GGNFS and msieve, etc. along with a good amount of unit test coverage for the library. We can talk about a separate PR for that after this one, but since I noticed you had an issue on the board for loading polynomials, I went ahead on that while waiting for your response. We will need to discuss how to best convert that format into one used in this library.

When I wrapped that up the serializer, I wrote a console app to run Polynomial selection using msieve against all of the RSA challenge numbers on the Wikipedia page. This should give us at least a starting point for targets for polynomial selection for N of various sizes.

Those changes are here.

Currently now poly selection is running on my GTX 3080 to generate .poly files for RSA challenge number. As a starting point I added poly files for RSA 100 and RSA 200 to a commit on this feature branch only until we can run through the best way forward for dealing with poly files.

Also, there is an RSA-100 poly in that branch. It might be helpful for you to look at the parameters to see what changed in reference why it can’t be factored here anymore, and also to give insight on how to convert from that format to the one being used here.

@AdamWhiteHat
Copy link
Owner

AdamWhiteHat commented May 20, 2023

Oh I completely forgot about the project board!
Despite that, its not very out of date. The polynomial is its own library and the prime generator is self contained and modular. I updated it to reflect that.

I will add you a a contributor to the project so you can add, move and complete items on there, review PRs and the like.

The ability to serialize/deserialize GGNFS poly and save files certainly is an interesting idea. One problem, though, is I'm currently not using a skewed polynomial; it has no need or notion of a skew parameter. This could (should?) change. It allows for better/faster sieving/relation finding by keeping those algebraic and rational norms smaller for longer and for larger values of A and B.
I'll refresh my knowledge about what a skewed polynomial means implementation-wise (I think this means its a homogeneous polynomial, IIRC) and upload the relevant literature on the subject. Then I can give a estimate at the level of effort and complexity.

@AdamWhiteHat
Copy link
Owner

@alexhiggins732

Hey, I had a question about your DotMpi project: Can this library be used to lock a thread to a single processor (SetThreadAffinityMask)?

See, my FastPrimeSieve class is a segmented Sieve of Eratosthenes where each segment is a bit-packed array of uint the size of the L1 CPU cache, and the array is accessed sequentially or in order of increasing bit offsets. The idea is to build and keep the whole thing in the CPU cache.
It occurs to me that the code makes the implicit assumption that it will be ran on the same processor the whole time. When I test this code by a process who's only task is to iterate the prime and write them to disk as fast as the disk can be written to, this is almost certainly the case. However, my GNFS project is doing a whole lot more with the prime numbers it iterates than just writing them to disk, including lots of conditional logic and calling delegates.
I haven't actually measured or know for a fact that the prime sieve is ever shared across processors, but I been looking into how to ensure that its not and its a lot more involved than I originally thought. Perhaps in practice it never comes up, or perhaps if I ran the prime sieve as a Task, the Task class offers me that level of control, or perhaps throwing the prime sieve into its own assembly with some special attributes or compiler options would be enough--more research is needed.
But I saw you DotMpi library and was wondering 1) if it has such functionality and 2) if not, perhaps in the course of making said library, you learned enough about threading and processor affinity and how .net behaves to be able to give me some insight as to the best/simplest way to go about pinning a method/piece of code to one processor?

AdamWhiteHat added a commit that referenced this issue Jul 18, 2023
1) NullReferenceException when loading Relations from disk
2) Allow create new/overwrite already existing factorization files. Added confirmation prompt. Added Relations files removal
3) Comment out IsRationalIrreducible exception throw in SquareFinder (pretty sure it makes no sense to perform this check. The rational/algebraic norms necessarily should share some factors, because we are trying to form a square number).
4) New/second integration test.
5) DirectoryLocations now handles constructing and caching filenames.
@AdamWhiteHat
Copy link
Owner

@alexhiggins732 I merged your contributions.

@alexhiggins732
Copy link
Collaborator Author

Sieve of Ero was my initial motivation for DotMpi. The library can take any static method as a callback. State is shared between child and parent processes by passing parameters and returning results which are serialized over an IPC connection under the hood. Of course you can share memory other ways as well (Disk, MemoryCache, DB)

For setting processor affinity, it can't be done for tasks, or on threads, within a process, Processor Affinity must be set on the entire process itself. Also, there really isn't a need to refactor your code into separate external libraries, but you certainly can use them. In fact, it appears there is a little startup performance boost, at least in DotNetCore, if DotMpi is launching a child process that is the same process as the one already running.

@alexhiggins732

Hey, I had a question about your DotMpi project: Can this library be used to lock a thread to a single processor (SetThreadAffinityMask)?

See, my FastPrimeSieve class is a segmented Sieve of Eratosthenes where each segment is a bit-packed array of uint the size of the L1 CPU cache, and the array is accessed sequentially or in order of increasing bit offsets. The idea is to build and keep the whole thing in the CPU cache. It occurs to me that the code makes the implicit assumption that it will be ran on the same processor the whole time. When I test this code by a process who's only task is to iterate the prime and write them to disk as fast as the disk can be written to, this is almost certainly the case. However, my GNFS project is doing a whole lot more with the prime numbers it iterates than just writing them to disk, including lots of conditional logic and calling delegates. I haven't actually measured or know for a fact that the prime sieve is ever shared across processors, but I been looking into how to ensure that its not and its a lot more involved than I originally thought. Perhaps in practice it never comes up, or perhaps if I ran the prime sieve as a Task, the Task class offers me that level of control, or perhaps throwing the prime sieve into its own assembly with some special attributes or compiler options would be enough--more research is needed. But I saw you DotMpi library and was wondering 1) if it has such functionality and 2) if not, perhaps in the course of making said library, you learned enough about threading and processor affinity and how .net behaves to be able to give me some insight as to the best/simplest way to go about pinning a method/piece of code to one processor?

@alexhiggins732
Copy link
Collaborator Author

@alexhiggins732 I merged your contributions.

I'll take some time to look at what was merged. I went off on a tangent and coded all kinds of functionality for this library and others we would look to take inspiration from, as well as some libraries I am separately working that would be incorporated here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants