Reserved de-dupe rulesAdding value conditions to dedupe rulesCreate new contacts every time someone makes a contribution ( with duplicate emails )Deduping organizationsWhat are the Attributes for the Reserved DeDupe RulesContact Dedupe BasicsCivi duplicate rules with respect to email & primary versus location typeEmployer on Profile - MismatchingDedupe Rules to import contacts without email addresses *UPDATE*Import Contacts with dedupe rulesUnsupervised reserved rule for individuals not working

Is it true that "The augmented fourth (A4) and the diminished fifth (d5) are the only aug and dim intervals that appear in diatonic scales"

How to answer pointed "are you quitting" questioning when I don't want them to suspect

How to deal with fear of taking dependencies

What does 'script /dev/null' do?

Why is making salt water prohibited on Shabbat?

How can I fix this gap between bookcases I made?

Eliminate empty elements from a list with a specific pattern

Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?

Email Account under attack (really) - anything I can do?

Is domain driven design an anti-SQL pattern?

What is the meaning of "of trouble" in the following sentence?

Need help identifying/translating a plaque in Tangier, Morocco

Does a dangling wire really electrocute me if I'm standing in water?

System.XmlException: start tag unexpected character =

Symmetry in quantum mechanics

Imbalanced dataset binary classification

Landlord wants to switch my lease to a "Land contract" to "get back at the city"

Why do we use polarized capacitors?

How to make payment on the internet without leaving a money trail?

Is there a familial term for apples and pears?

Crop image to path created in TikZ?

Information to fellow intern about hiring?

Why was the "bread communication" in the arena of Catching Fire left out in the movie?

Why did the Germans forbid the possession of pet pigeons in Rostov-on-Don in 1941?



Reserved de-dupe rules


Adding value conditions to dedupe rulesCreate new contacts every time someone makes a contribution ( with duplicate emails )Deduping organizationsWhat are the Attributes for the Reserved DeDupe RulesContact Dedupe BasicsCivi duplicate rules with respect to email & primary versus location typeEmployer on Profile - MismatchingDedupe Rules to import contacts without email addresses *UPDATE*Import Contacts with dedupe rulesUnsupervised reserved rule for individuals not working













3















I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.










share|improve this question


























    3















    I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.










    share|improve this question
























      3












      3








      3


      1






      I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.










      share|improve this question














      I'm wanting to refine the de-duping rules, but first of all I'd like to find out exactly what the predefined rules are before I create my own. They are reserved, so you can't edit them, which is fine, but it only tells you which fields they use and not what the weights and thresholds are so the full behaviour is not clear.







      duplicate-contacts






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 4 at 17:09









      Mick KahnMick Kahn

      621316




      621316




















          2 Answers
          2






          active

          oldest

          votes


















          4














          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





           SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.






          share|improve this answer

























          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00












          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            2 days ago


















          2














          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.




          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.






          share|improve this answer

























          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "605"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcivicrm.stackexchange.com%2fquestions%2f29155%2freserved-de-dupe-rules%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4














          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





           SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.






          share|improve this answer

























          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00












          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            2 days ago















          4














          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





           SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.






          share|improve this answer

























          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00












          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            2 days ago













          4












          4








          4







          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





           SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.






          share|improve this answer















          Demerit's and Mick's answers are incorrect for the (built-in) reserved rules - though it's definitely confusing!



          If a RuleGroup has a value in the name field, and that name corresponds to a filename in CRM/Dedupe/BAO/QueryBuilder, then the customized SQL in those files will be used. The existing entries in civicrm_rule for those RuleGroups are holdovers from before that system existed, and editing them has no effect.



          "Standard" dedupe rules with multiple criteria are very inefficient compared to handwritten SQL, which is why this is a valuable technique. You can create your own handwritten queries with hook_civicrm_dedupe, and the Veda dedupe extension has a number of excellent examples. Note that this extension doesn't work on modern Civi because of some of its other functions, but the dedupe rules can be ripped out into something else.



          Finally - I learned just yesterday that the built-in handwritten dedupe rules seem to execute different SQL when comparing in Unsupervised/Supervised mode (a single contact) vs. General mode (find all dupes). While I haven't proved it, I suspect that if you're in the rare scenario of needing to optimize your unsupervised/supervised dedupes, creating a new class to extend CRM_Dedupe_BAO_QueryBuilder is the way to go. I just posted org.agbu.optimizeddedupe to provide an example of this.



          UPDATE: More clarification.



          To understand how the queries work, it's best to look at an example, eg IndividualUnsupervised.php.



          The internal function is used if you go to Contacts » Find and Merge Duplicate Contacts and click Use Rule. The SQL query is:





           SELECT contact1.id as id1, contact2.id as id2, $rg->threshold as weight
          FROM civicrm_contact as contact1
          JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
          JOIN civicrm_contact as contact2 ON
          contact1.first_name = contact2.first_name AND
          contact1.last_name = contact2.last_name
          JOIN civicrm_email as email2 ON
          email2.contact_id=contact2.id AND
          email1.email=email2.email
          WHERE contact1.contact_type = 'Individual'"


          First, note that the weight is set to $rg->threshold - that is, the threshold in civicrm_rule_group. In other words, if this SQL matches, these records automatically meet the threshold for that rule. Hopefully that answers your main question! If you remove that field, you can run this SQL as-is in a SQL client and get a complete list of the duplicates it would return.



          To further clarify - unlike "regular" rules which are the result of several queries, each with their own weight - this runs a SINGLE query, and sets the weight equal to the rule's threshold. So it's a straight yes/no answer whether a record is a duplicate, based on whether the SQL finds them.



          That's not to say that you can't simulate length/weight, but it's tricky. My org.agbu.optimizeddedupe rule has a SQL statement you can look at which gives the same results as this rule:



          enter image description here



          However, it took about 5 seconds to compare even a single submitted contact against the existing 165,000 contacts in this databse with the existing rule. Now it's almost instantaneous.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 5 at 16:20

























          answered Apr 5 at 2:58









          Jon G - Megaphone TechJon G - Megaphone Tech

          27.5k11872




          27.5k11872












          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00












          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            2 days ago

















          • Thanks. Good lord...

            – Demerit
            Apr 5 at 4:16











          • Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

            – Mick Kahn
            Apr 5 at 8:45











          • I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

            – Mick Kahn
            Apr 5 at 14:00












          • @MickKahn I just updated my answer, hopefully it makes things clearer!

            – Jon G - Megaphone Tech
            Apr 5 at 16:20











          • Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

            – Mick Kahn
            2 days ago
















          Thanks. Good lord...

          – Demerit
          Apr 5 at 4:16





          Thanks. Good lord...

          – Demerit
          Apr 5 at 4:16













          Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

          – Mick Kahn
          Apr 5 at 8:45





          Jon G - that's all very interesting, but a bit beyond me. I don't think I need to optimize my own dedupe rules and I know I can make my own rules supervised or unsupervised and leave the prefigured rules unused as general. What I was trying to do was understand the weights/thresholds for the preconfigured rules. I don't see any values in the code etc that you have pointed to, so do they still come from the queries that Deremit has provided. They are certain;y compatible with the results that I see.

          – Mick Kahn
          Apr 5 at 8:45













          I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

          – Mick Kahn
          Apr 5 at 14:00






          I'm still a bit confused as I'm not that good at reading the code, though I can work out enough see that these rules are treated a special case. I can't see any values for length, weight or threshold there so still don't know the actual criteria for considering contacts to be duplicate under these rules. Or is some different algorithm used. Given that I have a relatively small number of contacts, I can just ditch the pre-configured rules and use my own (less efficient) ones. But it would be nice to know the criteria used in order to understand why some duplicates have been created.

          – Mick Kahn
          Apr 5 at 14:00














          @MickKahn I just updated my answer, hopefully it makes things clearer!

          – Jon G - Megaphone Tech
          Apr 5 at 16:20





          @MickKahn I just updated my answer, hopefully it makes things clearer!

          – Jon G - Megaphone Tech
          Apr 5 at 16:20













          Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

          – Mick Kahn
          2 days ago





          Thanks Jon - I don't understand it all, but see that my answer is not helpful so will delete that. I'm seeing some other unexpected results, but can acheive what I need with my own rules, so will move on to other things for now.

          – Mick Kahn
          2 days ago











          2














          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.




          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.






          share|improve this answer

























          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38















          2














          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.




          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.






          share|improve this answer

























          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38













          2












          2








          2







          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.




          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.






          share|improve this answer















          EDIT: This answer is wrong. See Jon's answer. The reserved rules don't use the values in the database they use custom queries.




          If you have access to the database type



          SELECT * from civicrm_dedupe_rule r inner join civicrm_dedupe_rule_group rg on rg.id = r.dedupe_rule_group_id;



          which will give you a table which isn't pretty but is mostly understandable.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 5 at 12:29

























          answered Apr 4 at 17:35









          DemeritDemerit

          4,1562621




          4,1562621












          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38

















          • Ah you're right. I'll update answer.

            – Demerit
            Apr 4 at 18:19











          • Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

            – Mick Kahn
            Apr 4 at 20:22











          • I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

            – Mick Kahn
            Apr 5 at 13:38
















          Ah you're right. I'll update answer.

          – Demerit
          Apr 4 at 18:19





          Ah you're right. I'll update answer.

          – Demerit
          Apr 4 at 18:19













          Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

          – Mick Kahn
          Apr 4 at 20:22





          Thanks that tells me what I need, so I have deleted my earlier comment on your previous version of the answer

          – Mick Kahn
          Apr 4 at 20:22













          I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

          – Mick Kahn
          Apr 5 at 13:38





          I'm taking off the green tick because Deremit has marked the answer as wrong, but (see below), I'm not offering the tick to Jon's answer yet as it doesn't fully answer my question.

          – Mick Kahn
          Apr 5 at 13:38

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to CiviCRM Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcivicrm.stackexchange.com%2fquestions%2f29155%2freserved-de-dupe-rules%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to write a 12-bar blues melodyI-IV-V blues progressionHow to play the bridges in a standard blues progressionHow does Gdim7 fit in C# minor?question on a certain chord progressionMusicology of Melody12 bar blues, spread rhythm: alternative to 6th chord to avoid finger stretchChord progressions/ Root key/ MelodiesHow to put chords (POP-EDM) under a given lead vocal melody (starting from a good knowledge in music theory)Are there “rules” for improvising with the minor pentatonic scale over 12-bar shuffle?Confusion about blues scale and chords

          What if the end-user didn't have the required library?What is setup.py?What is a clean, pythonic way to have multiple constructors in Python?What does Ruby have that Python doesn't, and vice versa?What is the reason for having '//' in Python?How do I create a namespace package in Python?How to package shared objects that python modules depend on?setuptools vs. distutils: why is distutils still a thing?Navigation in Windows 10 vs code not going to virtualenv library when the same library is installed at user levelPython create package for local usePackaging a project that uses multiple python versionsWhy is permission denied on pip install except for when “--user” is included at end of command?

          Esgonzo ibérico Índice Descrición Distribución Hábitat Ameazas Notas Véxase tamén "Acerca dos nomes dos anfibios e réptiles galegos""Chalcides bedriagai"Chalcides bedriagai en Carrascal, L. M. Salvador, A. (Eds). Enciclopedia virtual de los vertebrados españoles. Museo Nacional de Ciencias Naturales, Madrid. España.Fotos