Can a large file be hashed down to 32 bytes, and then reconstructed from the hash?











up vote
6
down vote

favorite
3












We can hash a file or data using multihash or SHA-256, but can we retrieve the original data or file from the hash?



Are there any methods to retrieve the original file or data from a hash of it without using IPFS?



Or is there any encryption method which encrypts a 5 MB file and outputs a hash-like content of 32 bytes so that we can retrieve the original file from that 32 byte content?










share|improve this question









New contributor




Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 56




    By the pigeonhole principle this is impossible to achieve.
    – Maeher
    yesterday






  • 4




    It may actually be possible, although this method is error prone and impractical. Run all Turing machines in parallel, with short Turing machines getting more time slices (say, the nth gets 2^-n). Take the first to terminate and output a string that hashes to your string. This algorithm biases toward simpler strings that have your hash. If your file has lower Kolmogorov complexity than any other string with the same hash, there is possibly a reasonable chance of obtaining the file first.
    – Solomonoff's Secret
    yesterday






  • 7




    There's a (rather small) subset of 5 MB files that can probably be compressed down to 32 bytes :) Like, 5 MB of zeros...
    – Seva Alekseyev
    yesterday






  • 3




    Possible duplicate of Would it be possible to generate the original data from a SHA-512 checksum?
    – pipe
    yesterday






  • 4




    @AnuDavis By the pigeonhole principle that is impossible to achieve.
    – immibis
    yesterday















up vote
6
down vote

favorite
3












We can hash a file or data using multihash or SHA-256, but can we retrieve the original data or file from the hash?



Are there any methods to retrieve the original file or data from a hash of it without using IPFS?



Or is there any encryption method which encrypts a 5 MB file and outputs a hash-like content of 32 bytes so that we can retrieve the original file from that 32 byte content?










share|improve this question









New contributor




Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 56




    By the pigeonhole principle this is impossible to achieve.
    – Maeher
    yesterday






  • 4




    It may actually be possible, although this method is error prone and impractical. Run all Turing machines in parallel, with short Turing machines getting more time slices (say, the nth gets 2^-n). Take the first to terminate and output a string that hashes to your string. This algorithm biases toward simpler strings that have your hash. If your file has lower Kolmogorov complexity than any other string with the same hash, there is possibly a reasonable chance of obtaining the file first.
    – Solomonoff's Secret
    yesterday






  • 7




    There's a (rather small) subset of 5 MB files that can probably be compressed down to 32 bytes :) Like, 5 MB of zeros...
    – Seva Alekseyev
    yesterday






  • 3




    Possible duplicate of Would it be possible to generate the original data from a SHA-512 checksum?
    – pipe
    yesterday






  • 4




    @AnuDavis By the pigeonhole principle that is impossible to achieve.
    – immibis
    yesterday













up vote
6
down vote

favorite
3









up vote
6
down vote

favorite
3






3





We can hash a file or data using multihash or SHA-256, but can we retrieve the original data or file from the hash?



Are there any methods to retrieve the original file or data from a hash of it without using IPFS?



Or is there any encryption method which encrypts a 5 MB file and outputs a hash-like content of 32 bytes so that we can retrieve the original file from that 32 byte content?










share|improve this question









New contributor




Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











We can hash a file or data using multihash or SHA-256, but can we retrieve the original data or file from the hash?



Are there any methods to retrieve the original file or data from a hash of it without using IPFS?



Or is there any encryption method which encrypts a 5 MB file and outputs a hash-like content of 32 bytes so that we can retrieve the original file from that 32 byte content?







hash compression






share|improve this question









New contributor




Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday









Ilmari Karonen

33.4k267133




33.4k267133






New contributor




Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









Anu Davis

7018




7018




New contributor




Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Anu Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 56




    By the pigeonhole principle this is impossible to achieve.
    – Maeher
    yesterday






  • 4




    It may actually be possible, although this method is error prone and impractical. Run all Turing machines in parallel, with short Turing machines getting more time slices (say, the nth gets 2^-n). Take the first to terminate and output a string that hashes to your string. This algorithm biases toward simpler strings that have your hash. If your file has lower Kolmogorov complexity than any other string with the same hash, there is possibly a reasonable chance of obtaining the file first.
    – Solomonoff's Secret
    yesterday






  • 7




    There's a (rather small) subset of 5 MB files that can probably be compressed down to 32 bytes :) Like, 5 MB of zeros...
    – Seva Alekseyev
    yesterday






  • 3




    Possible duplicate of Would it be possible to generate the original data from a SHA-512 checksum?
    – pipe
    yesterday






  • 4




    @AnuDavis By the pigeonhole principle that is impossible to achieve.
    – immibis
    yesterday














  • 56




    By the pigeonhole principle this is impossible to achieve.
    – Maeher
    yesterday






  • 4




    It may actually be possible, although this method is error prone and impractical. Run all Turing machines in parallel, with short Turing machines getting more time slices (say, the nth gets 2^-n). Take the first to terminate and output a string that hashes to your string. This algorithm biases toward simpler strings that have your hash. If your file has lower Kolmogorov complexity than any other string with the same hash, there is possibly a reasonable chance of obtaining the file first.
    – Solomonoff's Secret
    yesterday






  • 7




    There's a (rather small) subset of 5 MB files that can probably be compressed down to 32 bytes :) Like, 5 MB of zeros...
    – Seva Alekseyev
    yesterday






  • 3




    Possible duplicate of Would it be possible to generate the original data from a SHA-512 checksum?
    – pipe
    yesterday






  • 4




    @AnuDavis By the pigeonhole principle that is impossible to achieve.
    – immibis
    yesterday








56




56




By the pigeonhole principle this is impossible to achieve.
– Maeher
yesterday




By the pigeonhole principle this is impossible to achieve.
– Maeher
yesterday




4




4




It may actually be possible, although this method is error prone and impractical. Run all Turing machines in parallel, with short Turing machines getting more time slices (say, the nth gets 2^-n). Take the first to terminate and output a string that hashes to your string. This algorithm biases toward simpler strings that have your hash. If your file has lower Kolmogorov complexity than any other string with the same hash, there is possibly a reasonable chance of obtaining the file first.
– Solomonoff's Secret
yesterday




It may actually be possible, although this method is error prone and impractical. Run all Turing machines in parallel, with short Turing machines getting more time slices (say, the nth gets 2^-n). Take the first to terminate and output a string that hashes to your string. This algorithm biases toward simpler strings that have your hash. If your file has lower Kolmogorov complexity than any other string with the same hash, there is possibly a reasonable chance of obtaining the file first.
– Solomonoff's Secret
yesterday




7




7




There's a (rather small) subset of 5 MB files that can probably be compressed down to 32 bytes :) Like, 5 MB of zeros...
– Seva Alekseyev
yesterday




There's a (rather small) subset of 5 MB files that can probably be compressed down to 32 bytes :) Like, 5 MB of zeros...
– Seva Alekseyev
yesterday




3




3




Possible duplicate of Would it be possible to generate the original data from a SHA-512 checksum?
– pipe
yesterday




Possible duplicate of Would it be possible to generate the original data from a SHA-512 checksum?
– pipe
yesterday




4




4




@AnuDavis By the pigeonhole principle that is impossible to achieve.
– immibis
yesterday




@AnuDavis By the pigeonhole principle that is impossible to achieve.
– immibis
yesterday










5 Answers
5






active

oldest

votes

















up vote
44
down vote



accepted










No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





So how can something like IPFS work, then?



Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



(Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





*) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.






share|improve this answer























  • Comments are not for extended discussion; this conversation has been moved to chat.
    – SEJPM
    4 hours ago






  • 3




    Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
    – JBentley
    4 hours ago




















up vote
12
down vote













No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.






share|improve this answer























  • Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
    – Anu Davis
    yesterday








  • 2




    No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
    – Ruben De Smet
    yesterday












  • Thanks for the answer
    – Anu Davis
    yesterday


















up vote
2
down vote













The other answers are correct, there is no way to recover data from a hash.



From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.






share|improve this answer








New contributor




cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
    – Paul Uszak
    6 hours ago












  • I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
    – cakins
    3 hours ago


















up vote
0
down vote













Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





(*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found.






share|improve this answer








New contributor




Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    up vote
    -1
    down vote













    It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?






    share|improve this answer








    New contributor




    Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.


















    • Thanks for the answer
      – Anu Davis
      yesterday






    • 1




      Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
      – AleksanderRas
      yesterday











    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "281"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    Anu Davis is a new contributor. Be nice, and check out our Code of Conduct.










     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f64194%2fcan-a-large-file-be-hashed-down-to-32-bytes-and-then-reconstructed-from-the-has%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    44
    down vote



    accepted










    No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



    This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



    In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





    So how can something like IPFS work, then?



    Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



    (Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



    This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



    Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





    *) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.






    share|improve this answer























    • Comments are not for extended discussion; this conversation has been moved to chat.
      – SEJPM
      4 hours ago






    • 3




      Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
      – JBentley
      4 hours ago

















    up vote
    44
    down vote



    accepted










    No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



    This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



    In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





    So how can something like IPFS work, then?



    Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



    (Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



    This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



    Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





    *) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.






    share|improve this answer























    • Comments are not for extended discussion; this conversation has been moved to chat.
      – SEJPM
      4 hours ago






    • 3




      Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
      – JBentley
      4 hours ago















    up vote
    44
    down vote



    accepted







    up vote
    44
    down vote



    accepted






    No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



    This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



    In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





    So how can something like IPFS work, then?



    Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



    (Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



    This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



    Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





    *) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.






    share|improve this answer














    No, there is no way to compress (or hash or encrypt or whatever) a 5 MB file into a 32 byte hash and then reconstruct the original file just from the hash alone.



    This is simply because there are many more possible 5 MB files than there are 32 byte hashes. This means that, whatever hashing or compression or other algorithm you use, it must map many different 5 MB files to the same 32 byte hash. And that means that, given only the 32 byte hash, there is no way you can possibly tell which of those different 5 MB files it was created from.



    In fact, the same thing happens already if you hash 33 byte files into 32 byte hashes, and then try to reconstruct the original files from the hashes. Since there are 256 times as many 33 byte files as there are 32 byte hashes, that already means that there must be several different files that have the same hash. With 5 MB files, it's many, many, many times worse yet.





    So how can something like IPFS work, then?



    Basically, it relies on the fact that even the number of possible 32 byte hashes is really huge* — much, much larger than the total number of actual files (of any length) that humans have ever created, or are ever likely to create. So, while we know that there must be many possible files that have the same 32 byte hash, the chance of actually finding two different files that just happen to have the same hash by chance is still so incredibly small that we can basically assume it will never happen.



    (Also, cryptographic hash functions like SHA-256 are designed so that, hopefully, there are no practical ways to deliberately find files with the same hash more efficiently than by just hashing lots of files and hoping against all odds for a random collision.)



    This means that if we have some kind of a (possibly distributed) database containing a bunch of files and their SHA-256 hashes, then we can be pretty sure that it will never actually contain two files with the same hash, even if that's theoretically possible.



    Thus, as long as we have access to such a database, we can use the hash of any file in the database to look it up, and be almost 100% certain that we will only get one matching file back, not two or more. Technically, the probability of getting multiple matches is not quite exactly zero, but it's so incredibly small that it can be safely neglected in practice.





    *) In fact, it's 28×32 = 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 10 hours ago

























    answered yesterday









    Ilmari Karonen

    33.4k267133




    33.4k267133












    • Comments are not for extended discussion; this conversation has been moved to chat.
      – SEJPM
      4 hours ago






    • 3




      Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
      – JBentley
      4 hours ago




















    • Comments are not for extended discussion; this conversation has been moved to chat.
      – SEJPM
      4 hours ago






    • 3




      Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
      – JBentley
      4 hours ago


















    Comments are not for extended discussion; this conversation has been moved to chat.
    – SEJPM
    4 hours ago




    Comments are not for extended discussion; this conversation has been moved to chat.
    – SEJPM
    4 hours ago




    3




    3




    Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
    – JBentley
    4 hours ago






    Your first paragraph is technically incorrect. It is entirely possible to compress a 5MB file into 32 bytes. It just isn't possible to generalise it to all 5MB files (e.g. my algorithm can be If(text == completeWorksOfShakespeare) return stringThatIs32Bytes; else return text;)
    – JBentley
    4 hours ago












    up vote
    12
    down vote













    No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



    But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



    That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.






    share|improve this answer























    • Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
      – Anu Davis
      yesterday








    • 2




      No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
      – Ruben De Smet
      yesterday












    • Thanks for the answer
      – Anu Davis
      yesterday















    up vote
    12
    down vote













    No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



    But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



    That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.






    share|improve this answer























    • Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
      – Anu Davis
      yesterday








    • 2




      No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
      – Ruben De Smet
      yesterday












    • Thanks for the answer
      – Anu Davis
      yesterday













    up vote
    12
    down vote










    up vote
    12
    down vote









    No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



    But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



    That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.






    share|improve this answer














    No, it's not possible to retrieve the original input of a hash, because the input of a hash can be of almost any size (less than 2'091'752 terabytes).



    But the hash value is always a fixed length, i.e. a SHA-256 has always a 256-bit value. That's why there are multiple inputs that have the same hashed value, see Pigeonhole-principle.



    That's also the reason for why you can never retrieve the original input, because you can never be sure that this really would be the original input and not some other input that happens to have the same hash-value.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday

























    answered yesterday









    AleksanderRas

    1,2681222




    1,2681222












    • Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
      – Anu Davis
      yesterday








    • 2




      No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
      – Ruben De Smet
      yesterday












    • Thanks for the answer
      – Anu Davis
      yesterday


















    • Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
      – Anu Davis
      yesterday








    • 2




      No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
      – Ruben De Smet
      yesterday












    • Thanks for the answer
      – Anu Davis
      yesterday
















    Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
    – Anu Davis
    yesterday






    Thanks for the answer. Is there any encryption method which encrypts a 5mb file and outputs a hash like content or 32 bytes content so that we can retrieve the file from that 32 byte content?
    – Anu Davis
    yesterday






    2




    2




    No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
    – Ruben De Smet
    yesterday






    No. Like Maeher already said: it is impossible. You can see this by applying the pidgeon hole principle: you have $2^{32}$ possible hashes/encryption/encodings/pumpernickles, but many many more possible plaintexts. It would imply that most encryptions have multiple decryptions, just like this answer explains.
    – Ruben De Smet
    yesterday














    Thanks for the answer
    – Anu Davis
    yesterday




    Thanks for the answer
    – Anu Davis
    yesterday










    up vote
    2
    down vote













    The other answers are correct, there is no way to recover data from a hash.



    From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



    Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.






    share|improve this answer








    New contributor




    cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.


















    • Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
      – Paul Uszak
      6 hours ago












    • I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
      – cakins
      3 hours ago















    up vote
    2
    down vote













    The other answers are correct, there is no way to recover data from a hash.



    From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



    Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.






    share|improve this answer








    New contributor




    cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.


















    • Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
      – Paul Uszak
      6 hours ago












    • I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
      – cakins
      3 hours ago













    up vote
    2
    down vote










    up vote
    2
    down vote









    The other answers are correct, there is no way to recover data from a hash.



    From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



    Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.






    share|improve this answer








    New contributor




    cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    The other answers are correct, there is no way to recover data from a hash.



    From your phrasing, I think that this may be an instance of an X-Y problem: you need very aggressive lossless compression, and hashes plus some way to undo them are the closest thing you know of.



    Accordingly you might look into an abuse of fractal compression: oversample your bitstream such that fractal compression gives an image that results in the original bitstream after a downsampling pass. In principle this can trade the length of your transmitted message for potentially large amounts of pre- and post-computation - you only have to transmit the coefficients and stopping conditions of your fractal calculation and the program that understands them, which could be as small as a few kilobytes, but the search to find those numbers is computationally hard.







    share|improve this answer








    New contributor




    cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    share|improve this answer



    share|improve this answer






    New contributor




    cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    answered 7 hours ago









    cakins

    211




    211




    New contributor




    cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





    New contributor





    cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    cakins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.












    • Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
      – Paul Uszak
      6 hours ago












    • I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
      – cakins
      3 hours ago


















    • Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
      – Paul Uszak
      6 hours ago












    • I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
      – cakins
      3 hours ago
















    Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
    – Paul Uszak
    6 hours ago






    Isn't it lossy though? You'd loose a little entropy every time stuff is compressed.
    – Paul Uszak
    6 hours ago














    I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
    – cakins
    3 hours ago




    I have only imagined this, not proven it correct. In principle the oversampling accounts for the lossiness with extra redundancy. In practice I don't know how well or even if it works, I'm no information theorist.
    – cakins
    3 hours ago










    up vote
    0
    down vote













    Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





    (*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found.






    share|improve this answer








    New contributor




    Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      0
      down vote













      Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





      (*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found.






      share|improve this answer








      New contributor




      Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















        up vote
        0
        down vote










        up vote
        0
        down vote









        Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





        (*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found.






        share|improve this answer








        New contributor




        Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        Simple information theory shows that this is not possible. For any given hash value, there's an infinite number of files that produce that hash (assuming there's no arbitrary limit on the input length). It's possible(*) to produce a file that produces the same hash, but because information has been discarded, there's no way to say whether or not it's the file that was originally hashed - all of those infinitely many inputs are equally plausible.





        (*) "possible" in a theoretical sense - the role of a good cryptographic hash is to make such reconstruction hard - preferably as hard as randomly guessing until a match is found.







        share|improve this answer








        New contributor




        Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|improve this answer



        share|improve this answer






        New contributor




        Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered 2 hours ago









        Toby Speight

        1095




        1095




        New contributor




        Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        Toby Speight is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






















            up vote
            -1
            down vote













            It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?






            share|improve this answer








            New contributor




            Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.


















            • Thanks for the answer
              – Anu Davis
              yesterday






            • 1




              Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
              – AleksanderRas
              yesterday















            up vote
            -1
            down vote













            It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?






            share|improve this answer








            New contributor




            Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.


















            • Thanks for the answer
              – Anu Davis
              yesterday






            • 1




              Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
              – AleksanderRas
              yesterday













            up vote
            -1
            down vote










            up vote
            -1
            down vote









            It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?






            share|improve this answer








            New contributor




            Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            It's not possible to get back the information from the hash. However there is a method where data can be converted into base64 string. Maybe you are confusing that with hash ?







            share|improve this answer








            New contributor




            Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            share|improve this answer



            share|improve this answer






            New contributor




            Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            answered yesterday









            Sooraj

            211




            211




            New contributor




            Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.





            New contributor





            Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            Sooraj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.












            • Thanks for the answer
              – Anu Davis
              yesterday






            • 1




              Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
              – AleksanderRas
              yesterday


















            • Thanks for the answer
              – Anu Davis
              yesterday






            • 1




              Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
              – AleksanderRas
              yesterday
















            Thanks for the answer
            – Anu Davis
            yesterday




            Thanks for the answer
            – Anu Davis
            yesterday




            1




            1




            Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
            – AleksanderRas
            yesterday




            Base64 convertion is just another form of plaintext, the question asked for a method hash-reversion.
            – AleksanderRas
            yesterday










            Anu Davis is a new contributor. Be nice, and check out our Code of Conduct.










             

            draft saved


            draft discarded


















            Anu Davis is a new contributor. Be nice, and check out our Code of Conduct.













            Anu Davis is a new contributor. Be nice, and check out our Code of Conduct.












            Anu Davis is a new contributor. Be nice, and check out our Code of Conduct.















             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f64194%2fcan-a-large-file-be-hashed-down-to-32-bytes-and-then-reconstructed-from-the-has%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Willebadessen

            Ida-Boy-Ed-Garten

            Residenzschloss Arolsen