|
|
|
08-19-2019, 06:57 PM
|
#1
|
Human being with feelings
Join Date: Sep 2018
Location: China
Posts: 565
|
How to tell whether two files are the same one or not
For example, I have two files, D:\1.wav, E:\1.wav. Is it possible to compare them with internal or sws or JS API and tell if they are the same one or they just have a same name?
|
|
|
08-19-2019, 09:30 PM
|
#2
|
Human being with feelings
Join Date: Oct 2013
Location: Moscow, Russia
Posts: 3,984
|
io.open in binary mode, check content string length and some random bytes, no extension needed
|
|
|
08-19-2019, 09:39 PM
|
#3
|
Human being with feelings
Join Date: Sep 2018
Location: China
Posts: 565
|
Quote:
Originally Posted by mpl
io.open in binary mode, check content string length and some random bytes, no extension needed
|
Thanks mpl!
|
|
|
08-20-2019, 05:49 AM
|
#4
|
Human being with feelings
Join Date: Apr 2013
Location: France
Posts: 9,900
|
Rather than compare bit per bit, just compare Checksum :P
|
|
|
08-20-2019, 08:53 PM
|
#5
|
Human being with feelings
Join Date: Sep 2018
Location: China
Posts: 565
|
Quote:
Originally Posted by X-Raym
Rather than compare bit per bit, just compare Checksum :P
|
Can you give me a little more hints about how to compare Checksum please?
|
|
|
08-21-2019, 01:51 AM
|
#6
|
Human being with feelings
Join Date: Apr 2013
Location: France
Posts: 9,900
|
Use some checksum / m5 lua libraries, there is some on github
https://github.com/kikito/md5.lua
https://github.com/philanc/plc/tree/master/plc
I didn't tets it but the main idea is that it will output a unique string from a file,
so if two files are two different, it will output diffeent strings.
You only need to make a string comparison based on that.
|
|
|
08-21-2019, 03:06 AM
|
#7
|
Human being with feelings
Join Date: Feb 2007
Location: Oulu, Finland
Posts: 8,062
|
Quote:
Originally Posted by X-Raym
You only need to make a string comparison based on that.
|
But to get the checksum, the file of course has to be read and processed fully first anyway, so using checksums doesn't necessarily make anything faster. For example if you need to do the comparison between 2 files just once, it would be just the same to just directly compare the files at the byte level. That could even be faster than calculating the checksums because you can do an early exit from the comparison algorithm at the first difference between the files, while checksums require to go through the entire file.
__________________
I am no longer part of the REAPER community. Please don't contact me with any REAPER-related issues.
Last edited by Xenakios; 08-21-2019 at 03:12 AM.
|
|
|
08-21-2019, 09:36 AM
|
#8
|
Human being with feelings
Join Date: Apr 2013
Location: France
Posts: 9,900
|
@Xenakios
You are right,
so checksum will be relevant if file need to be compared to several files, or if one of the file is inacessible.(if you already have its checksum beforehand)
|
|
|
08-22-2019, 01:35 AM
|
#9
|
Human being with feelings
Join Date: Sep 2018
Location: China
Posts: 565
|
Quote:
Originally Posted by Xenakios
For example if you need to do the comparison between 2 files just once, it would be just the same to just directly compare the files at the byte level. That could even be faster than calculating the checksums because you can do an early exit from the comparison algorithm at the first difference between the files, while checksums require to go through the entire file.
|
So I tried to write a script like this
Code:
local file=io.open("e:\\1.wav", "rb")
local content=file:read("*all")
local len=content:len()
file:close()
local f1={}
for i=1, len do --store each byte of file1
local text=content:sub(i, i)
table.insert(f1, text:byte(1))
end
local file=io.open("e:\\2.wav", "rb")
local content=file:read("*all")
local len=content:len()
file:close()
local f2={}
for i=1, len do --store each byte of file2
local text=content:sub(i, i)
table.insert(f2, text:byte(1))
end
local diff=false
for k, v in pairs(f2) do
if v~=f1[k] then
diff=true
break
end
end
if diff then msg("They are different") else msg("They are the same") end
Is it the right way to compare two files?
Last edited by dsyrock; 08-22-2019 at 02:18 AM.
|
|
|
08-22-2019, 07:32 AM
|
#10
|
Human being with feelings
Join Date: Oct 2013
Location: Moscow, Russia
Posts: 3,984
|
Slightly easier check I suggested before:
Code:
local file1=io.open("e:\\1.wav", "rb")
if file1 then
content1=file:read("*all")
len1=content1:len()
file:close()
end
local file2=io.open("e:\\2.wav", "rb")
if file2 then
local content2=file:read("*all")
local len2=content2:len()
file2:close()
end
test_byte1 = math.floor(math.random()*math.min(len1,len2))
test_byte2 = math.floor(math.random()*math.min(len1,len2))
test_byte3 = math.floor(math.random()*math.min(len1,len2))
local diff = content1:byte(test_byte1)==content2:byte(test_byte1)
and content1:byte(test_byte2)==content2:byte(test_byte2)
and content1:byte(test_byte3)==content2:byte(test_byte3)
and len1==len2
if diff then msg("They are same") else msg("They are the different") end
|
|
|
08-23-2019, 10:59 AM
|
#11
|
Human being with feelings
Join Date: Sep 2018
Location: China
Posts: 565
|
Quote:
Originally Posted by mpl
Slightly easier check I suggested before:
|
Thanks again. It's much faster than comparing byte by byte. Is it safe enough? I mean is it possible that there are two files which are very simular to each other, and they have same sizes. In this situation, is comparing ramdom bytes able to tell they are different?
|
|
|
08-23-2019, 11:00 AM
|
#12
|
Human being with feelings
Join Date: Apr 2013
Location: France
Posts: 9,900
|
@dsyrock
Of course
A simple letter in ASCII is only 8 bits so a difference of one letter in a txt document would surely not be found if random bits are taken (and even more if you consider than differents characters can have similar bits at some positions).
Bit per bit is only way to be sure if file have same number of bits.
|
|
|
08-24-2019, 01:22 PM
|
#13
|
Human being with feelings
Join Date: May 2017
Location: Leipzig
Posts: 6,630
|
As X-Raym pointed out: the only way to do it and to be sure it is exact is doing checking every single byte.
You could try using a comparison-commandline-tool instead, which might be faster than doing it in Lua.
|
|
|
08-24-2019, 01:41 PM
|
#14
|
Human being with feelings
Join Date: Feb 2007
Location: Oulu, Finland
Posts: 8,062
|
Quote:
Originally Posted by X-Raym
@dsyrock
Bit per bit is only way to be sure if file have same number of bits.
|
Or rather byte by byte, no sane code would these days do actual bit by bit comparisons unless there was some really good reason for that.
__________________
I am no longer part of the REAPER community. Please don't contact me with any REAPER-related issues.
|
|
|
Thread Tools |
|
Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -7. The time now is 04:31 AM.
|