Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Transparent Proxy/Web Gateway? #66

Closed
atinm opened this issue Dec 1, 2024 · 24 comments
Closed

Question: Transparent Proxy/Web Gateway? #66

atinm opened this issue Dec 1, 2024 · 24 comments
Assignees
Labels
enhancement New feature or request

Comments

@atinm
Copy link

atinm commented Dec 1, 2024

Pingap seems to be very closely tied to the idea of reverse proxied, load balanced upstreams, and I was wondering how I could extend its capabilities to also support transparent proxing of upstreams that are separate from the load balanced reverse proxy upstreams.

That is, if transparent proxying is enabled in a new config section, pingap would search the sni/host/authority header in reverse proxied locations/upstreams, but if no match, would generate a dynamic certificate (possibly wildcarded) using a configured transparent proxy server cert and key (that is added to trusted certs for the browser) for the upstream to send back to downstream browser/app and make the upstream request and connect the upstream and downstream connections. The pingora examples have a basic example of such a gateway, and I also built an example (https://github.com/atinm/pingora/blob/transparent_proxy/pingora-proxy/examples/transparent_proxy.rs) that generates wildcard certs for upstreams to transparent proxy using pingora, but it would be great to be able to extend pingap to handle both reverse and transparent proxy use cases.

@vicanso vicanso added the question Further information is requested label Dec 2, 2024
@vicanso
Copy link
Owner

vicanso commented Dec 2, 2024

I will consider supporting it in the future, but it is not in the immediate plan.

@vicanso vicanso self-assigned this Dec 2, 2024
@atinm
Copy link
Author

atinm commented Dec 6, 2024

Thanks! If you have the time, could you direct me to where you might change or add code - maybe I could help? I was thinking it is where the code checks for upstream locations and healthy upstreams, an else clause that checks to see if transparent proxy is enabled could be added that generates a new dynamic cert for the upstream and returns the upstream. Would need to remove the checks that require location if transparent proxy is enabled during config parsing.

@vicanso vicanso added enhancement New feature or request and removed question Further information is requested labels Dec 6, 2024
@vicanso
Copy link
Owner

vicanso commented Dec 7, 2024

@atinm The latest commit already supports transparent proxy, the toml config:

[upstreams.transparentUpstream]
addrs = ["127.0.0.1"]
discovery = "transparent"
sni = "$host"

[locations.transparentLocation]
upstream = "transparentUpstream"

[servers.test]
addr = "127.0.0.1:443"
locations = ["transparentLocation"]
global_certificates = true

[certificates.medev]
tls_cert = """
-----BEGIN CERTIFICATE-----
MIIEljCCAv6gAwIBAgIQeYUdeFj3gpzhQes3aGaMZTANBgkqhkiG9w0BAQsFADCB
pTEeMBwGA1UEChMVbWtjZXJ0IGRldmVsb3BtZW50IENBMT0wOwYDVQQLDDR4aWVz
aHV6aG91QHhpZXNodXpob3VzLU1hY0Jvb2stQWlyLmxvY2FsICjosKLmoJHmtLIp
MUQwQgYDVQQDDDtta2NlcnQgeGllc2h1emhvdUB4aWVzaHV6aG91cy1NYWNCb29r
LUFpci5sb2NhbCAo6LCi5qCR5rSyKTAeFw0yMzA5MjQxMzA1MjdaFw0yNTEyMjQx
MzA1MjdaMGgxJzAlBgNVBAoTHm1rY2VydCBkZXZlbG9wbWVudCBjZXJ0aWZpY2F0
ZTE9MDsGA1UECww0eGllc2h1emhvdUB4aWVzaHV6aG91cy1NYWNCb29rLUFpci5s
b2NhbCAo6LCi5qCR5rSyKTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB
ALuJ8lYEj9uf4iE9hguASq7re87Np+zJc2x/eqr1cR/SgXRStBsjxqI7i3xwMRqX
AuhAnM6ktlGuqidl7D9y6AN/UchqgX8AetslRJTpCcEDfL/q24zy0MqOS0FlYEgh
s4PIjWsSNoglBDeaIdUpN9cM/64IkAAtHndNt2p2vPfjrPeixLjese096SKEnZM/
xBdWF491hx06IyzjtWKqLm9OUmYZB9d/gDGnDsKpqClw8m95opKD4TBHAoE//WvI
m1mZnjNTNR27vVbmnc57d2Lx2Ib2eqJG5zMsP2hPBoqS8CKEwMRFLHAcclNkI67U
kcSEGaWgr15QGHJPN/FtjDsCAwEAAaN+MHwwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud
JQQMMAoGCCsGAQUFBwMBMB8GA1UdIwQYMBaAFJo0y9bYUM/OuenDjsJ1RyHJfL3n
MDQGA1UdEQQtMCuCBm1lLmRldoIJbG9jYWxob3N0hwR/AAABhxAAAAAAAAAAAAAA
AAAAAAABMA0GCSqGSIb3DQEBCwUAA4IBgQAlQbow3+4UyQx+E+J0RwmHBltU6i+K
soFfza6FWRfAbTyv+4KEWl2mx51IfHhJHYZvsZqPqGWxm5UvBecskegDExFMNFVm
O5QixydQzHHY2krmBwmDZ6Ao88oW/qw4xmMUhzKAZbsqeQyE/uiUdyI4pfDcduLB
rol31g9OFsgwZrZr0d1ZiezeYEhemnSlh9xRZW3veKx9axgFttzCMmWdpGTCvnav
ZVc3rB+KBMjdCwsS37zmrNm9syCjW1O5a1qphwuMpqSnDHBgKWNpbsgqyZM0oyOc
9Bkja+BV5wFO+4zH5WtestcrNMeoQ83a5lI0m42u/bUEJ/T/5BQBSFidNuvS7Ylw
IZpXa00xvlnm1BOHOfRI4Ehlfa5jmfcdnrGkQLGjiyygQtKcc7rOXGK+mSeyxwhs
sIARwslSQd4q0dbYTPKvvUHxTYiCv78vQBAsE15T2GGS80pAFDBW9vOf3upANvOf
EHjKf0Dweb4ppL4ddgeAKU5V0qn76K2fFaE=
-----END CERTIFICATE-----"""
tls_key = """
-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC7ifJWBI/bn+Ih
PYYLgEqu63vOzafsyXNsf3qq9XEf0oF0UrQbI8aiO4t8cDEalwLoQJzOpLZRrqon
Zew/cugDf1HIaoF/AHrbJUSU6QnBA3y/6tuM8tDKjktBZWBIIbODyI1rEjaIJQQ3
miHVKTfXDP+uCJAALR53Tbdqdrz346z3osS43rHtPekihJ2TP8QXVhePdYcdOiMs
47Viqi5vTlJmGQfXf4Axpw7CqagpcPJveaKSg+EwRwKBP/1ryJtZmZ4zUzUdu71W
5p3Oe3di8diG9nqiRuczLD9oTwaKkvAihMDERSxwHHJTZCOu1JHEhBmloK9eUBhy
TzfxbYw7AgMBAAECggEALjed0FMJfO+XE+gMm9L/FMKV3W5TXwh6eJemDHG2ckg3
fQpQtouHjT2tb3par5ndro0V19tBzzmDV3hH048m3I3JAuI0ja75l/5EO4p+y+Fn
IgjoGIFSsUiGBVTNeJlNm0GWkHeJlt3Af09t3RFuYIIklKgpjNGRu4ccl5ExmslF
WHv7/1dwzeJCi8iOY2gJZz6N7qHD95VkgVyDj/EtLltONAtIGVdorgq70CYmtwSM
9XgXszqOTtSJxle+UBmeQTL4ZkUR0W+h6JSpcTn0P9c3fiNDrHSKFZbbpAhO/wHd
Ab4IK8IksVyg+tem3m5W9QiXn3WbgcvjJTi83Y3syQKBgQD5IsaSbqwEG3ruttQe
yfMeq9NUGVfmj7qkj2JiF4niqXwTpvoaSq/5gM/p7lAtSMzhCKtlekP8VLuwx8ih
n4hJAr8pGfyu/9IUghXsvP2DXsCKyypbhzY/F2m4WNIjtyLmed62Nt1PwWWUlo9Q
igHI6pieT45vJTBICsRyqC/a/wKBgQDAtLXUsCABQDTPHdy/M/dHZA/QQ/xU8NOs
ul5UMJCkSfFNk7b2etQG/iLlMSNup3bY3OPvaCGwwEy/gZ31tTSymgooXQMFxJ7G
1S/DF45yKD6xJEmAUhwz/Hzor1cM95g78UpZFCEVMnEmkBNb9pmrXRLDuWb0vLE6
B6YgiEP6xQKBgBOXuooVjg2co6RWWIQ7WZVV6f65J4KIVyNN62zPcRaUQZ/CB/U9
Xm1+xdsd1Mxa51HjPqdyYBpeB4y1iX+8bhlfz+zJkGeq0riuKk895aoJL5c6txAP
qCJ6EuReh9grNOFvQCaQVgNJsFVpKcgpsk48tNfuZcMz54Ii5qQlue29AoGAA2Sr
Nv2K8rqws1zxQCSoHAe1B5PK46wB7i6x7oWUZnAu4ZDSTfDHvv/GmYaN+yrTuunY
0aRhw3z/XPfpUiRIs0RnHWLV5MobiaDDYIoPpg7zW6cp7CqF+JxfjrFXtRC/C38q
MftawcbLm0Q6MwpallvjMrMXDwQrkrwDvtrnZ4kCgYEA0oSvmSK5ADD0nqYFdaro
K+hM90AVD1xmU7mxy3EDPwzjK1wZTj7u0fvcAtZJztIfL+lmVpkvK8KDLQ9wCWE7
SGToOzVHYX7VazxioA9nhNne9kaixvnIUg3iowAz07J7o6EU8tfYsnHxsvjlIkBU
ai02RHnemmqJaNepfmCdyec=
-----END PRIVATE KEY-----"""
is_default = true
 curl -kv --resolve '*:443:127.0.0.1' 'https://cn.bing.com/'

@atinm
Copy link
Author

atinm commented Dec 7, 2024

That is awesome, thank you!

@atinm
Copy link
Author

atinm commented Dec 7, 2024

Maybe add the transparent proxy toml to examples and close the issue whenever you think it is ready. I will test this myself as soon as I can get back on my computer. Thank you so much for getting to it so quickly!

@adammakowskidev
Copy link

@vicanso Will a new version be released today?

@vicanso
Copy link
Owner

vicanso commented Dec 8, 2024

@vicanso Will a new version be released today?

Yes.

@atinm
Copy link
Author

atinm commented Dec 9, 2024

@vicanso this is close, but in a transparent proxy scenario, pingap should be generating the dynamic wildcard certificate that is signed by a trusted certificate (which can be a self-signed certificate) instead of using a single wildcard certificate that is marked global as browsers will complain or even curl without -k will complain if the SAN doesn't match.

For example, you would have a self-signed CA certificate loaded into pingap via config (call it is_ca true, or I guess you can use is_global), but then if you try https://cn.bing.com, it should look for *.bing.com cert, but if not found, generate a new ServerAuth certificate with SNI wildcard.bing.com, with *.bing.com added to SAN, signed by the global ca certificate that has been added to browser's trusted certs or systems trusted certs and to pingap config. Store the new server auth certificate in dynamic_certificates_map for next time and use it for this connection. Then if you try https://www.yahoo.com next, it would generate a ServerAuth cert for *.yahoo.com the same way. This way you do not need to generate a wildcard cert for every possible site, and can use pingap as a web gateway and could add plugins that provide malware protection, etc in the request and response path.

That is what I used rcgen for in my hacked example test code in (https://github.com/atinm/pingora/blob/94f58da58b703b4c1c3b93365c6b7f9b5e92c179/pingora-proxy/examples/transparent_proxy.rs#L257) but I think it could be done better.

@vicanso
Copy link
Owner

vicanso commented Dec 9, 2024

I'm not familiar with transparent proxy, so I made a mistake. I thought it was pre-generated certificates, but according to your example, it is dynamically generated. I will refer to it later to see how to implement this logic.

@atinm
Copy link
Author

atinm commented Dec 10, 2024

@vicanso you are very close, it is just setting of CA certificate in config and generating ServerAuth certificates dynamically signed by the CA certificate that is missing, so thank you so much for looking into it.

@vicanso
Copy link
Owner

vicanso commented Dec 15, 2024

@atinm You can try using the latest commit, which should meet your expectations.

The certificate config:

[certificates.rootCa]
is_default = true
is_root = true
-----BEGIN CERTIFICATE-----
......
-----END CERTIFICATE-----"""
tls_key = """
-----BEGIN PRIVATE KEY-----
.....
-----END PRIVATE KEY-----"""
curl -kv --resolve '*:443:127.0.0.1' 'https://cn.bing.com/'
curl -kv --resolve '*:443:127.0.0.1' 'https://www.baidu.com/'

@atinm
Copy link
Author

atinm commented Dec 15, 2024

@vicanso Thank you! I will try it out today, but the commit looks good.

@atinm
Copy link
Author

atinm commented Dec 16, 2024

@vicanso I have some comments in the f1b2f6f commit. I have made my suggested changes to my local fork of pingap to test it all out (except for saving the generated certs for reuse, and checking expiry of certs) in atinm@5ec9fb9 (I can send a pull request if you want, I didn't because I am sure you can do this better than me and add the saving of wildcard cert and key to admin specified directory, and load on startup, do expiry checking, etc as I wrote in the comments on your commit)

I then added my CA Certificate to my keychain access, and made it trusted for my localhost MacOS machine:
Screenshot 2024-12-15 at 8 28 18 PM

Then I ran my copy of pingap in docker, exposing docker port 443 as 8443 on my localhost. But I used rdr to map 443 to 8443 on my localhost (I have some functions defined in my .zshrc to help me):

function port_forward {
    echo "
rdr pass inet proto tcp from any to any port 443 -> 127.0.0.1 port $1
rdr pass inet proto tcp from any to any port 80 -> 127.0.0.1 port $2
" | sudo pfctl -ef -
}

function port_revert {
    sudo pfctl -F all -f /etc/pf.conf
}

function port_show {
    sudo pfctl -s nat
}

So I can forward using:
# port_forward 8443 8080
and revert the changes using:
# port_revert

I also have an alias for chrome that maps all requests to 127.0.0.1:

alias chrome-tp='/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --user-data-dir=~/tp-chrome-profile --host-resolver-rules="MAP * 127.0.0.1"'

I brought up chrome using this alias and just went to https://www.yahoo.com (I have also enabled_h2=true in my transparent-proxy.toml server to see it use h2 as www.yahoo.com is HTTP/2 enabled). It generated the correct certificate for *.yahoo.com signed by my CA!

Screenshot 2024-12-15 at 8 36 33 PM

I am able to see the whole page, with no untrusted certificate errors. I think this is almost ready to close!

Thank you!

@vicanso
Copy link
Owner

vicanso commented Dec 16, 2024

Sorry, PR is not accepted before version 1.0.

In my opinion, the following points need to be optimized:

  • fix OrganizationName
  • enhance subject_alt_names
  • get not_before and not_after from ca
  • use ca instead of root certificate
  • validate expiry date of certificate

@atinm
Copy link
Author

atinm commented Dec 17, 2024

@vicanso yes. that is correct. You can set not_before to now_utc like you are, since you are creating the certificate right then and it should not set the not_before to be before its creation time as well as the ca could have been created a while ago.

I would also add saving the generated cert and key to a user specified directory from config, using the wildcard.yahoo.com.pem, wildcard.yahoo.com.key name for example if the generated certificate was for www.yahoo.com, so that you can reload on restart and not have to regenerate already generated certificates again and also possibly use an LRU cache for the generated certs in memory as, for example, just going to https://www.yahoo.com in Chrome generates 320 certificates and you may not want to keep all hundreds in memory for the life of pingap run and instead just reload an evicted cert again if needed later by checking the directory before generating a new cert and overwriting it if it has expired also. I was thinking that you could add a self_signed_certs_dir config for the CA cert config that would be the directory path where to save the generated certs and keys generated by that CA cert.

Thank you again for working on this!

@vicanso
Copy link
Owner

vicanso commented Dec 17, 2024

I have a question, is it necessary to save the certificate?

  • Generating certificates is not slow
  • Certificates do not take up much memory
  • Certificates that have not been used for a long time or have expired can be cleaned up regularly through the background service

@vicanso
Copy link
Owner

vicanso commented Dec 17, 2024

@atinm

By the way, what is the purpose of the following code?

SanType::DnsName(Ia5String::try_from("localhost").unwrap()),
        SanType::IpAddress(std::net::IpAddr::V4(std::net::Ipv4Addr::new(127, 0, 0, 1))),
        SanType::IpAddress(std::net::IpAddr::V6(std::net::Ipv6Addr::new(
            0, 0, 0, 0, 0, 0, 0, 1,
        ))),

@atinm
Copy link
Author

atinm commented Dec 17, 2024

@vicanso When you are generating 100s of certificates per web page (like happens on sites like www.yahoo.com which link to 100s of other domains for fetching all sorts of things images, audio, trackers, etc), you may not want to keep all the certificates in memory. Just browsing yahoo for me generated 100s of certificates, then if I go to https://cnn.com, it is 100s more, etc. The time to generate adds up and if pingap is being used in a high performance domain like malware detection where you want it to go really fast as it is inline with user traffic going to the web, you may not want to generate the certificates if you already generated them before. But I haven't actually done a performance comparison between re-loading a generated cert from disk vs generating on the fly though, maybe the performance is acceptable. Similarly, keeping 100s or 1000s of certificates in memory might not be what you want and an LRU cache might be better to keep the most used certs in memory (will still need to have space for 100s or 1000s...).

Adding the localhost and 127.0.0.1 is useful when debugging the certs but not necessary.

Thanks!

@atinm
Copy link
Author

atinm commented Dec 18, 2024

@vicanso I will see if I can test performance difference between loading from disk vs generating certs. If performance is acceptable, then just regenerating missing certs might be ok.

Similarly, maybe LRU isnt needed; with virtual memory, pingap could just keep in memory in the map like you are. But you need to add locking unless the map is thread-safe (I haven't checked) as many certs might be getting generated across parallel requests from different users, maybe even regenerating the same cert in two threads for two requests in parallel.

A background service that deletes certs that will expire in 24 hours, that runs every 12 hours might enough like you suggested also.

@vicanso
Copy link
Owner

vicanso commented Dec 18, 2024

When performance test result comes out, could you please let me know. And I use arc-swap for updating certificate map, which is thread-safe.

@atinm
Copy link
Author

atinm commented Dec 19, 2024

@vicanso reading a generated server_cert.pem and the associated key from disk and creating a DynamicCertificate from that vs generating a new DynamicCertificate is faster based on my tests. I am getting [1.1668 ms 1.1750 ms 1.1850 ms] to generate a brand new DynamicCertificate every time vs [66.069 µs 66.473 µs 66.956 µs] for reading an already generated cert and key from disk and creating a CertificateConf and then using parse_certificate to generate DynamicCertificate from that. I don't think it is worth it to implement storing to disk for a 1 ms difference per certificate. Even with 300 certificates per web page, it is only 0.3 seconds difference. The server response is a lot slower than that.

I tested with atinm@40bdb7c and it is working great for me with no errors for pages like https://cnn.com, https://www.yahoo.com that have a lot of certificates and domains they link to.

@atinm
Copy link
Author

atinm commented Dec 20, 2024

@vicanso your latest commit looks great. The example configuration for transparent-proxy.toml can be updated to use the new is_ca configuration but after that I think this is good to close. I am able to transparently proxy anything, including streaming video on cnn.com using pingap now.

@vicanso
Copy link
Owner

vicanso commented Dec 25, 2024

@atinm Do you have other questions?

@atinm
Copy link
Author

atinm commented Dec 29, 2024

@vicanso I think is good for what I was looking for, this can be closed!

@atinm atinm closed this as completed Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants