First of all, I would like to note the excellent quality of your program Everything!
First question.
Are there any plans to introduce the ability to set the number of threads for individual folders/disks that will be used to index properties?
A real-life example: there are several network storage devices, each of which has several network folders accessible via Samba. Some devices work well with a number of simultaneous requests of about 10-20. Other devices work well only with one thread at a time - if you make, for example, 10 requests simultaneously, the response time increases by 50 or more times. And finally, there are some devices that provide maximum performance with 4-6 simultaneous requests.
In my case, the main time cost is getting SHA-256 file hashes.
It seems that the ability to set the number of threads individually for each device could increase the performance of obtaining hashes.
And the second question.
Will it be possible to add a new property for folders: calculate a hash based on existing (previously calculated) hashes of all files inside the folder?
At the moment, the folder hash can be calculated:
* based on file names, including the folder name - this allows you to search for identical copies by name, but excludes the ability to search for copies with different names of base (root, parent) folders
* based on file hashes - but for each folder, all file contents are read anew. For example, if you add a folder containing the following structure to the indexed ones:
my-app/
├─ X/
│ ├─ Y/
│ │ ├─ Z/
│ │ │ ├─ file_1_terabyte
├─ file_1_megabyte
then file_1_terabyte will be read from disk at least 3 times - to get the SHA-256 hash of folders X, X/Y and X/Y/Z. And if you add the SHA-256 property for files, then file_1_terabyte will be read a fourth time.
It seems that it would be great if there was an option to calculate the folder hash based on a list of file hashes that have already been calculated - this way you would have to read the contents of each file only once.
First question.
Are there any plans to introduce the ability to set the number of threads for individual folders/disks that will be used to index properties?
A real-life example: there are several network storage devices, each of which has several network folders accessible via Samba. Some devices work well with a number of simultaneous requests of about 10-20. Other devices work well only with one thread at a time - if you make, for example, 10 requests simultaneously, the response time increases by 50 or more times. And finally, there are some devices that provide maximum performance with 4-6 simultaneous requests.
In my case, the main time cost is getting SHA-256 file hashes.
It seems that the ability to set the number of threads individually for each device could increase the performance of obtaining hashes.
And the second question.
Will it be possible to add a new property for folders: calculate a hash based on existing (previously calculated) hashes of all files inside the folder?
At the moment, the folder hash can be calculated:
* based on file names, including the folder name - this allows you to search for identical copies by name, but excludes the ability to search for copies with different names of base (root, parent) folders
* based on file hashes - but for each folder, all file contents are read anew. For example, if you add a folder containing the following structure to the indexed ones:
my-app/
├─ X/
│ ├─ Y/
│ │ ├─ Z/
│ │ │ ├─ file_1_terabyte
├─ file_1_megabyte
then file_1_terabyte will be read from disk at least 3 times - to get the SHA-256 hash of folders X, X/Y and X/Y/Z. And if you add the SHA-256 property for files, then file_1_terabyte will be read a fourth time.
It seems that it would be great if there was an option to calculate the folder hash based on a list of file hashes that have already been calculated - this way you would have to read the contents of each file only once.
Statistics: Posted by tirael — Mon Sep 30, 2024 9:50 am