Note
mimansajaiswal: how does the hook based duplicate detection work before adding something?
GitHub Copilot: Based on the code shown, here's how the hook-based duplicate detection works:
- The core detection logic appears to work through several hooks and processes:
- First, the detection starts in
hooks.ts
through theonNotify
handler which listens for item modifications:
// ...existing code...
const toRefresh =
// subset of "modify" event (modification on item data and authors) on regular items
(extraData && Object.values(extraData).some((data) => data.refreshDuplicates)) ||
// "add" event on regular items
(type == "item" && event == "add" && containsRegularItem(ids)) ||
// "refresh" event on trash
(type == "trash" && event == "refresh");
// ...existing code...
- When duplicates need to be detected, it uses the
duplicates.ts
module which patches Zotero's built-in duplicate detection:
Made changes.
// ...existing code...
export function patchFindDuplicates(db: NonDuplicatesDB) {
const patch = new ztoolkit.Patch();
patch.setData({
target: Zotero.Duplicates.prototype,
funcSign: "_findDuplicates",
enabled: true,
patcher: (original: any) =>
async function (this: any) {
const duplicateSets = await db.getNonDuplicates({ libraryID: this.libraryID });
NonDuplicates.getInstance().allNonDuplicates = new Set(
duplicateSets.map(({ itemID, itemID2 }) => [itemID, itemID2].sort().join(",")),
);
await original.call(this);
},
});
// ...existing code...
- The actual duplicate finding logic uses the
DuplicateFinder
class which implements multiple strategies:
Made changes.
async find() {
await this.findByDcReplacesRelation();
await this.findByDOI();
await this.findBookByISBN();
await this.findByTitle();
await this.findByCreators();
await this.findByYear();
return this.candidateItemIDs;
}
The key aspects of the system are:
-
It hooks into Zotero's event system to detect when items are added or modified
-
It maintains a database of known non-duplicates to exclude from detection
-
It uses multiple strategies to find duplicates including:
- DC Replaces relations
- DOI matching
- ISBN matching (for books)
- Title matching
- Creator matching
- Year matching
-
The system is designed to be non-destructive, allowing users to:
- Mark false positives as non-duplicates
- Choose which version to keep
- Merge duplicate information rather than just deleting items
So if you want to add something new, you would need to:
- Hook into the notification system in hooks.ts for your trigger conditions
- Add any new detection logic to DuplicateFinder class
- Update the patch system if you need to modify Zotero's default behavior
- Consider adding to the non-duplicates functionality if needed
Made changes.
Note
mimansajaiswal: how is this done and decided which one is the master item?
GitHub Copilot: From the code shown, the master item selection logic is primarily handled in the DuplicateItems
class. Here's how it works:
- The master item preference is configured in the preferences UI (
preferences.xhtml
):
// ...existing code...
<radiogroup id="zotero-prefpane-__addonRef__-master-item" preference="__prefsPrefix__.bulk.master.item"
orient="vertical">
<radio data-l10n-id="pref-default-master-item-oldest" value="oldest" />
<radio data-l10n-id="pref-default-master-item-newest" value="newest" />
<radio data-l10n-id="pref-default-master-item-modified" value="modified" />
<radio data-l10n-id="pref-default-master-item-detailed" value="detailed" />
</radiogroup>
// ...existing code...
- The actual logic for selecting the master item is in
DuplicateItems
class:
Made changes.
// ...existing code...
private analyze() {
let compare: (a: Zotero.Item, b: Zotero.Item) => number;
switch (this._masterItemPref) {
default:
case MasterItem.OLDEST:
compare = (a: Zotero.Item, b: Zotero.Item) => (b.dateAdded < a.dateAdded ? 1 : -1);
break;
case MasterItem.NEWEST:
compare = (a: Zotero.Item, b: Zotero.Item) => (b.dateAdded > a.dateAdded ? 1 : -1);
break;
case MasterItem.MODIFIED:
compare = (a: Zotero.Item, b: Zotero.Item) => (b.dateModified > a.dateModified ? 1 : -1);
break;
case MasterItem.DETAILED:
compare = (a: Zotero.Item, b: Zotero.Item) => {
const fieldDiff = b.getUsedFields(false).length - a.getUsedFields(false).length;
if (fieldDiff !== 0) {
return fieldDiff;
}
return b.dateAdded < a.dateAdded ? 1 : -1;
};
break;
}
this._items.sort(compare);
this._masterItem = this._items[0];
}
// ...existing code...
The master item selection works as follows:
-
User can choose from 4 different strategies in preferences:
OLDEST
: Keep the item with earliest dateAddedNEWEST
: Keep the item with latest dateAddedMODIFIED
: Keep the most recently modified itemDETAILED
: Keep the item with the most filled fields (if equal, use oldest)
-
When duplicates are found,
DuplicateItems
class:- Takes an array of duplicate items and the master item preference
- Uses the
analyze()
method to sort items according to the chosen strategy - Sets the first item after sorting as the master item
-
The comparison functions define how items are ranked:
- Compare dates for OLDEST/NEWEST/MODIFIED strategies
- For DETAILED strategy:
- First compare number of filled fields
- If equal, use oldest item as tiebreaker
This provides a flexible system where:
- Users can choose their preferred strategy
- The selection is deterministic based on clear criteria
- The detailed option allows preserving the most complete record
- The selection happens automatically once the strategy is chosen
Made changes.
Note
mimansajaiswal: do we use action preferences or master item preferences when a new item is added?
GitHub Copilot: From reviewing the code, when a new item is added, Zoplicate uses both the action preferences and master item preferences, but in different contexts:
- Action Preferences (
duplicate.default.action
) is used first for deciding how to handle duplicates:
<radiogroup id="zotero-prefpane-__addonRef__-default-action"
preference="__prefsPrefix__.duplicate.default.action" orient="vertical">
<radio data-l10n-id="pref-default-action-keep-this" value="keep" />
<radio data-l10n-id="pref-default-action-keep-others" value="discard" />
<radio data-l10n-id="pref-default-action-keep-all" value="cancel" />
<radio data-l10n-id="pref-default-action-always-ask" value="ask" />
</radiogroup>
- Master Item Preferences (
bulk.master.item
) is used when the action involves merging:
Made changes.
<radiogroup id="zotero-prefpane-__addonRef__-master-item" preference="__prefsPrefix__.bulk.master.item"
orient="vertical">
<radio data-l10n-id="pref-default-master-item-oldest" value="oldest" />
<radio data-l10n-id="pref-default-master-item-newest" value="newest" />
<radio data-l10n-id="pref-default-master-item-modified" value="modified" />
<radio data-l10n-id="pref-default-master-item-detailed" value="detailed" />
</radiogroup>
The workflow appears to be:
-
When a new item is added, the action preference decides whether to:
- Keep the new item
- Keep the existing items
- Keep all items
- Ask the user
-
If the action involves merging (keep new/keep existing), then the master item preferences are used to determine which item becomes the master when merging, as shown in DuplicateItems class:
private analyze() {
let compare: (a: Zotero.Item, b: Zotero.Item) => number;
switch (this._masterItemPref) {
case MasterItem.OLDEST:
compare = (a: Zotero.Item, b: Zotero.Item) => (b.dateAdded < a.dateAdded ? 1 : -1);
break;
// ...existing code...
}
this._items.sort(compare);
this._masterItem = this._items[0];
}
So both preferences work together - the action preference controls the high-level behavior, while the master item preference controls the details of merging when needed.
Made changes.
Note
mimansajaiswal: what does each of them do? When a new item is added, the action preference decides whether to:
Keep the new item Keep the existing items Keep all items Ask the user
GitHub Copilot: Looking at the code, here's how each action works when a new duplicate item is added:
- The actions are defined in preferences.xhtml:
<radiogroup id="zotero-prefpane-__addonRef__-default-action"
preference="__prefsPrefix__.duplicate.default.action" orient="vertical">
<radio data-l10n-id="pref-default-action-keep-this" value="keep" /> <!-- Keep new item -->
<radio data-l10n-id="pref-default-action-keep-others" value="discard" /> <!-- Keep existing items -->
<radio data-l10n-id="pref-default-action-keep-all" value="cancel" /> <!-- Keep all items -->
<radio data-l10n-id="pref-default-action-always-ask" value="ask" /> <!-- Ask user -->
</radiogroup>
When a new item is added and duplicates are detected:
-
"keep" (Keep New):
- The new item becomes the master item
- Merges information from existing items into the new item
- Existing duplicate items are deleted after merging
-
"discard" (Keep Others):
- Uses the DuplicateItems class to determine which existing item becomes master based on masterItemPref
- Merges information from the new item into the master item
- The new item is deleted after merging
-
"cancel" (Keep All):
- No merging happens
- Both new and existing items are kept in the library
- Items remain marked as duplicates in Zotero's duplicate detection
-
"ask" (Always Ask):
- Shows a dialog to let user choose what to do
- User can select any of the above actions
- Can optionally save choice as new default
The actual merging logic is handled by the DuplicateItems class which considers the masterItemPref setting to determine which item becomes the master when merging:
Made changes.
private analyze() {
let compare: (a: Zotero.Item, b: Zotero.Item) => number;
switch (this._masterItemPref) {
case MasterItem.OLDEST: // Keep oldest added item as master
case MasterItem.NEWEST: // Keep newest added item as master
case MasterItem.MODIFIED: // Keep most recently modified as master
case MasterItem.DETAILED: // Keep item with most fields as master
// ...existing code...
}
this._items.sort(compare);
this._masterItem = this._items[0];
}
So the action preference decides the high-level behavior (keep/discard/merge), while the master item preference determines which specific item becomes the master when merging is involved.
Made changes.
Note
mimansajaiswal: but keep new is becoming master, right? it is not then relying on masterItemPref to decide which one to keep? when is merge called in the action hook?
GitHub Copilot: You're asking a good question about when the merge logic actually happens. Looking at the hooks and files provided, there seems to be key information missing - we need to see how the default actions ("keep", "discard", etc.) are actually handled.
However, we can piece together the workflow from looking at the referenced code:
- When a new item is added, it triggers the notification system in hooks.ts:
// ...existing code...
if (toRefresh) {
if (type == "item" && event == "add") {
await Duplicates.instance.whenItemsAdded(duplicatesObj, ids as number[]);
}
}
- The preference UI sets up what should happen:
Made changes.
<radiogroup id="zotero-prefpane-__addonRef__-default-action"
preference="__prefsPrefix__.duplicate.default.action" orient="vertical">
<radio value="keep" /> <!-- Keep new item -->
<radio value="discard" /> <!-- Keep existing items -->
<radio value="cancel" /> <!-- Keep all items -->
<radio value="ask" /> <!-- Ask user -->
</radiogroup>
However, you're right - there appears to be an inconsistency here. The code suggests two things:
- For "keep" (keep new), it should just use the new item directly as master
- For "discard" (keep existing), it should use masterItemPref to choose among existing items
To properly answer your question about when merge is called and resolve this potential inconsistency, we would need to see:
- The
Duplicates
class implementation, specifically itswhenItemsAdded
method - The code that actually handles the actions ("keep", "discard", etc.)
Can you share those files to help clarify exactly how the merge workflow happens?
This would help us understand:
- If "keep new" truly bypasses masterItemPref
- When exactly merge gets called
- How the different preferences interact with each other
Made changes.
Note
mimansajaiswal: here you go, now tell me what happens when an item is added
GitHub Copilot: Looking at the code, here's the flow when a new item is added:
- First, the notification system triggers in
hooks.ts
:
if (toRefresh) {
if (type == "item" && event == "add") {
await Duplicates.instance.whenItemsAdded(duplicatesObj, ids as number[]);
}
}
- This calls
whenItemsAdded
in the Duplicates class:
Made changes.
async whenItemsAdded(duplicatesObj, ids) {
const defaultAction = getPref("duplicate.default.action") as Action;
if (defaultAction === Action.CANCEL || ids.length === 0) {
return;
}
// Create maps of duplicates
let duplicateMaps = ids.reduce((acc, id) => {
const existingItemIDs = duplicatesObj.getSetItemsByItemID(id).filter((i: number) => i !== id);
if (existingItemIDs.length > 0) {
acc.set(id, { existingItemIDs, action: defaultAction });
}
return acc;
}, new Map());
if (defaultAction === Action.ASK) {
await this.showDuplicates(duplicateMaps);
return;
}
this.processDuplicates(duplicateMaps);
}
- When processing duplicates, it handles each action differently:
Made changes.
async processDuplicates(duplicateMaps) {
// ...existing code...
for (const [newItemID, { existingItemIDs, action }] of duplicateMaps) {
if (action === Action.KEEP) {
// Keep new item as master
items.push({
masterItem: newItem,
otherItems: existingItemIDs.map((id) => Zotero.Items.get(id)),
});
} else if (action === Action.DISCARD) {
// Use masterItemPref to choose master from existing items
const duplicateItems = new DuplicateItems(existingItemIDs, masterItemPref);
const masterItem = duplicateItems.masterItem;
const otherItems = duplicateItems.otherItems;
items.push({
masterItem: masterItem,
otherItems: [...otherItems, newItem],
});
}
// Action.CANCEL just returns without doing anything
}
// ...merge items...
}
So the key workflow is:
-
Action.KEEP
(Keep New):- New item becomes master
- Existing items are merged into it
- masterItemPref is NOT used
-
Action.DISCARD
(Keep Existing):- Uses masterItemPref to choose which existing item becomes master
- New item and other existing items are merged into master
- masterItemPref IS used to select master
-
Action.CANCEL
(Keep All):- No merging happens
- All items are kept
-
Action.ASK
:- Shows dialog for user to choose what to do
- Then follows one of the above paths
This explains why sometimes masterItemPref appears to be ignored - it's only used when keeping existing items (DISCARD), not when keeping the new item (KEEP).
Note
mimansajaiswal: in this case, the new item can never become the master right? Action.DISCARD (Keep Existing):
Uses masterItemPref to choose which existing item becomes master New item and other existing items are merged into master masterItemPref IS used to select master
GitHub Copilot: Looking at the code in duplicates.ts, this isn't correct - there are actually two separate paths:
async processDuplicates(duplicateMaps) {
// ...existing code...
for (const [newItemID, { existingItemIDs, action }] of duplicateMaps) {
if (action === Action.KEEP) {
// When keeping new item, it becomes the master directly
items.push({
masterItem: newItem, // New item is master
otherItems: existingItemIDs.map((id) => Zotero.Items.get(id)),
});
} else if (action === Action.DISCARD) {
// When discarding new item, use masterItemPref to choose from existing
const duplicateItems = new DuplicateItems(existingItemIDs, masterItemPref);
const masterItem = duplicateItems.masterItem; // Existing item chosen by pref becomes master
const otherItems = duplicateItems.otherItems;
items.push({
masterItem: masterItem,
otherItems: [...otherItems, newItem], // New item becomes part of otherItems
});
}
}
// ...existing code...
}
So actually:
-
With
Action.KEEP
:- The new item DOES become the master
- masterItemPref is NOT used
- Existing items are merged into the new item
-
With
Action.DISCARD
:- The new item NEVER becomes the master
- masterItemPref IS used to choose among existing items
- New item gets merged into the chosen master along with other existing items
So I was incorrect in my earlier analysis - the new item CAN become the master, but only when Action.KEEP
is chosen. The masterItemPref is only used when keeping existing items (Action.DISCARD
).
Made changes.
Note
mimansajaiswal: how are the duplicates found when action.keep is used? or how are they found when action.discard is used?
GitHub Copilot: Looking at the code, the duplicate finding process is actually the same for both Action.KEEP and Action.DISCARD. The difference is only in how they're handled after being found. Here's the process:
- First, the duplicates are found in
whenItemsAdded()
:
async whenItemsAdded(duplicatesObj, ids) {
// ...existing code...
// Find duplicates for each new item
const duplicateMaps = ids.reduce((acc, id) => {
const existingItemIDs = duplicatesObj.getSetItemsByItemID(id)
.filter((i: number) => i !== id); // Get all items except the new one
if (existingItemIDs.length > 0) {
acc.set(id, { existingItemIDs, action: defaultAction });
}
return acc;
}, new Map());
}
- The key search for duplicates happens in the Zotero.Duplicates class, which is patched in patcher.ts:
Made changes.
export function patchFindDuplicates(db: NonDuplicatesDB) {
patch.setData({
target: Zotero.Duplicates.prototype,
funcSign: "_findDuplicates",
// ...existing code...
});
}
- The actual duplicate finding is done using multiple strategies in
DuplicateFinder
:
Made changes.
async find() {
await this.findByDcReplacesRelation(); // Check dc:replaces relations
await this.findByDOI(); // Match DOIs
await this.findBookByISBN(); // Match ISBNs for books
await this.findByTitle(); // Match titles
await this.findByCreators(); // Match creators
await this.findByYear(); // Match publication years
return this.candidateItemIDs;
}
So the duplicate finding process is identical regardless of whether Action.KEEP or Action.DISCARD is chosen. The only difference comes after duplicates are found, in how they are processed:
Made changes.
async processDuplicates() {
// ...existing code...
if (action === Action.KEEP) {
// New item becomes master directly
items.push({
masterItem: newItem,
otherItems: existingItemIDs.map((id) => Zotero.Items.get(id)),
});
} else if (action === Action.DISCARD) {
// Use masterItemPref to pick master from existing items
const duplicateItems = new DuplicateItems(existingItemIDs, masterItemPref);
const masterItem = duplicateItems.masterItem;
items.push({
masterItem: masterItem,
otherItems: [...duplicateItems.otherItems, newItem],
});
}
}
The detection process is the same - what differs is only how the duplicates are handled after being found.
Note
mimansajaiswal: how is it patched? how is the duplicate finding done? explain this to me in detail. something is not working correctly with duplicate finding.
GitHub Copilot: Let me break down how the duplicate finding process works through the patches and code:
- First, there's a patch to Zotero's duplicate finding system in
patcher.ts
:
export function patchFindDuplicates(db: NonDuplicatesDB) {
patch.setData({
target: Zotero.Duplicates.prototype,
funcSign: "_findDuplicates",
patcher: (original: any) =>
async function (this: any) {
// Get items marked as non-duplicates
const duplicateSets = await db.getNonDuplicates({ libraryID: this.libraryID });
NonDuplicates.getInstance().allNonDuplicates = new Set(
duplicateSets.map(({ itemID, itemID2 }) => [itemID, itemID2].sort().join(","))
);
await original.call(this);
},
});
// This is the key patch that prevents certain items from being marked as duplicates
patch.setData({
target: Zotero.DisjointSetForest.prototype,
funcSign: "union",
patcher: (original) =>
function (this: any, x: { id: number }, y: { id: number }) {
const allNonDuplicates = NonDuplicates.getInstance().allNonDuplicates;
const pair = [x.id, y.id].sort().join(",");
if (allNonDuplicates.has(pair)) {
return; // Skip marking these as duplicates if they're in non-duplicates list
}
original.call(this, x, y);
},
});
}
- The actual duplicate finding happens in
DuplicateFinder.ts
, which uses multiple strategies:
Made changes.
async find() {
await this.findByDcReplacesRelation(); // Check DC relations
await this.findByDOI(); // Check DOIs
await this.findBookByISBN(); // Check ISBNs for books
await this.findByTitle(); // Check titles
await this.findByCreators(); // Check authors
await this.findByYear(); // Check publication year
return this.candidateItemIDs;
}
- When a new item is added, the hook system triggers in
hooks.ts
:
Made changes.
if (toRefresh) {
if (type == "item" && event == "add") {
const { duplicatesObj } = await fetchDuplicates({ libraryID, refresh: true });
await Duplicates.instance.whenItemsAdded(duplicatesObj, ids as number[]);
}
}
The potential issue I see is in whenItemsAdded
in duplicates.ts
:
Made changes.
async whenItemsAdded(duplicatesObj, ids) {
// ...existing code...
let duplicateItemMap = new Map<number, DuplicateItems>();
for (const id of ids) {
const items = duplicatesObj.getSetItemsByItemID(id);
if (items.length < 2) {
continue;
}
const duplicateItems = new DuplicateItems(items, getPref("bulk.master.item") as MasterItem);
duplicateItemMap.set(duplicateItems.key, duplicateItems);
}
// This second map creation seems redundant and could be causing issues
const duplicateMaps = ids.reduce((acc, id) => {
const existingItemIDs = duplicatesObj.getSetItemsByItemID(id).filter((i: number) => i !== id);
if (existingItemIDs.length > 0) {
acc.set(id, { existingItemIDs, action: defaultAction });
}
return acc;
}, new Map());
}
The issues I see:
- Two different maps are being created for the same data
- The first map uses masterItemPref but isn't actually used
- The second map ignores masterItemPref initially
- The selection of master item depends entirely on which action is chosen later
To fix this, you probably want to:
- Remove the redundant first map creation
- Consider masterItemPref during the initial duplicate detection phase
- Ensure consistent handling of master item selection across all paths
Would you like me to propose specific code changes to address these issues?
Made changes.
Note
mimansajaiswal: explain how duplicates are currently found in duplicatefinder
GitHub Copilot: Looking at the DuplicateFinder class, duplicates are found through a series of sequential methods, each using different strategies:
async find() {
// 1. Check DC Replace Relations
await this.findByDcReplacesRelation();
// 2. Try DOI matching if step 1 didn't find single match
await this.findByDOI();
// 3. Try ISBN matching for books
await this.findBookByISBN();
// 4. Try title matching
await this.findByTitle();
// 5. Try creator matching
await this.findByCreators();
// 6. Try year matching
await this.findByYear();
return this.candidateItemIDs;
}
The key aspects of each method:
findByDcReplacesRelation
:
private async findByDcReplacesRelation() {
const predicate = Zotero.Relations.replacedItemPredicate;
this.candidateItemIDs = await DuplicateFinder.findByRelations(this.item, predicate, true);
}
findByDOI
:
private async findByDOI() {
if (this.candidateItemIDs.length === 1) return; // Stop if already found exact match
const dois = cleanDOI(this.item);
// SQL query to find items with matching DOI in DOI/URL/extra fields
const query = `SELECT DISTINCT itemID
FROM itemDataValues
WHERE ... AND (${partialWhereClause}) ${candidateAndClause};`;
findBookByISBN
:
private async findBookByISBN() {
if (this.itemTypeID !== Zotero.ItemTypes.getID("book")) return;
const isbns = cleanISBN(this.item);
// SQL query to find books with matching ISBN
findByTitle
:
private async findByTitle() {
const titles = unique([
normalizeString(this.item.getDisplayTitle()),
normalizeString(this.item.getField("title")),
]);
// SQL query to find items with matching normalized titles
findByCreators
:
private async findByCreators() {
// Only checks primary creator type (e.g., author for articles)
const creators = this.item.getCreators()
.filter((creator) => creator.creatorTypeID === primaryCreatorTypeID)
.map((creator) => cleanCreator(creator));
// SQL query to find items with matching creator names
findByYear
:
private async findByYear(threshold = 1) {
// Looks for items within +/- 1 year
const minYear = year - threshold;
const maxYear = year + threshold;
// SQL query to find items in year range
Important notes:
- Methods are run in sequence from most specific to least specific
- Each method adds to or filters the
candidateItemIDs
array - Most methods (except first) have early exits if a single match was already found
- SQL queries use item type matching to prevent cross-type duplicates
- The methods build upon each other with
candidateAndClause
filtering
This sequential approach means that duplicate detection becomes increasingly lenient if exact matches aren't found, while still maintaining reasonable constraints through item type and field matching.
Made changes.
Note
mimansajaiswal: where all does item type matching happen in the codebase?
GitHub Copilot: Looking through the codebase, item type matching happens in several places:
- During SQL queries in
DuplicateFinder
:
private async findByDOI() {
// ...existing code...
const query = `SELECT DISTINCT itemID
FROM itemDataValues
WHERE deletedItems.itemID IS NULL
AND libraryID = ?
AND itemTypeID = ? // Here - matches item type
AND fieldID IN (${fieldIDInClause})`;
- During merge operations in
merger.ts
:
Made changes.
export async function merge(masterItem: Zotero.Item, otherItems: Zotero.Item[]) {
const masterItemType = masterItem.itemTypeID;
// Check if any items need type conversion
const hasMismatch = otherItems.some(item => item.itemTypeID !== masterItemType);
if (hasMismatch) {
const typeMismatchPref = getPref("duplicate.type.mismatch") as TypeMismatch;
// Handle type mismatches according to preferences:
// - SKIP: Filter out mismatched items
// - CONVERT: Convert items to master type
// - ASK: Show dialog asking user what to do
- The type mismatch preferences in preferences.xhtml:
Made changes.
<radiogroup id="zotero-prefpane-__addonRef__-type-mismatch"
preference="__prefsPrefix__.duplicate.type.mismatch" orient="vertical">
<radio data-l10n-id="pref-type-mismatch-skip" value="skip" />
<radio data-l10n-id="pref-type-mismatch-convert" value="convert" />
<radio data-l10n-id="pref-type-mismatch-ask" value="ask" />
</radiogroup>
So type matching serves two purposes:
- During duplicate detection - ensuring only same-type items are considered duplicates
- During merging - handling cases where item types don't match through user preferences
The system is designed to be strict about type matching during detection, but flexible during merging based on user preferences.
Made changes.
Note
mimansajaiswal: where is this error:
Two different maps are being created for the same data The first map uses masterItemPref but isn't actually used The second map ignores masterItemPref initially The selection of master item depends entirely on which action is chosen later
GitHub Copilot: The error is in whenItemsAdded
in the Duplicates class:
async whenItemsAdded(duplicatesObj, ids) {
// First map that isn't used
let duplicateItemMap = new Map<number, DuplicateItems>();
for (const id of ids) {
const items = duplicatesObj.getSetItemsByItemID(id);
if (items.length < 2) continue;
// This creates DuplicateItems with masterItemPref but never uses it
const duplicateItems = new DuplicateItems(items, getPref("bulk.master.item") as MasterItem);
duplicateItemMap.set(duplicateItems.key, duplicateItems);
}
// Second map that actually gets used but ignores masterItemPref initially
const duplicateMaps = ids.reduce((acc, id) => {
const existingItemIDs = duplicatesObj.getSetItemsByItemID(id).filter((i: number) => i !== id);
if (existingItemIDs.length > 0) {
acc.set(id, { existingItemIDs, action: defaultAction }); // No masterItemPref here
}
return acc;
}, new Map());
// Later in processDuplicates, the action determines whether masterItemPref is used:
if (action === Action.KEEP) {
// Doesn't use masterItemPref at all
items.push({
masterItem: newItem,
otherItems: existingItemIDs.map((id) => Zotero.Items.get(id)),
});
} else if (action === Action.DISCARD) {
// Only now uses masterItemPref
const duplicateItems = new DuplicateItems(existingItemIDs, masterItemPref);
The fix would be to:
- Remove the unused first map (
duplicateItemMap
) - Include masterItemPref consideration when initially finding duplicates, not just at processing time
- Make the master item selection consistent regardless of the action chosen
Would you like me to propose specific code changes to fix this?
Made changes.
Note
mimansajaiswal: what does this mean in uplicate finding? Most methods (except first) have early exits if a single match was already found
GitHub Copilot: Looking at DuplicateFinder
, each method (except findByDcReplacesRelation
) has this check at the start:
private async findByDOI() {
if (this.candidateItemIDs.length === 1) {
return this; // Early exit if we already found exactly one match
}
// ...rest of method...
}
This early exit pattern exists in:
findByDOI()
findBookByISBN()
findByTitle()
findByCreators()
findByYear()
The logic is:
-
If we've already found exactly one match during any previous step, stop searching further. This is because:
- One exact match is the ideal case
- Additional searching could find false positives
- Having exactly one match means we found a high-confidence duplicate (like matching DOI)
-
Only
findByDcReplacesRelation()
doesn't have this check because:- It runs first
- DC relations are explicit links between items
- We want to find all related items even if there's already one match
This is important because the methods go from most specific/reliable matching (DOI, ISBN) to more fuzzy matching (title, creators). By exiting early when we find an exact match, we avoid potentially incorrect matches from the fuzzier methods.
For example, if we find a DOI match, we don't need to check titles which could match different papers that happen to have similar titles.
Made changes.
Note
mimansajaiswal: is the merging done before or after the item is added?
GitHub Copilot: Looking at the code flow, the merging is done AFTER the item is added. Here's the sequence:
- First, the item is added and triggers the notification system in
hooks.ts
:
if (toRefresh) {
if (type == "item" && event == "add") {
const { duplicatesObj } = await fetchDuplicates({ libraryID, refresh: true });
await Duplicates.instance.whenItemsAdded(duplicatesObj, ids as number[]);
}
}
- Then
whenItemsAdded
in the Duplicates class processes it:
Made changes.
async whenItemsAdded(duplicatesObj, ids) {
// Figure out what to do with the already-added item
const defaultAction = getPref("duplicate.default.action") as Action;
// ...existing code...
// Either show dialog or process duplicates
if (defaultAction === Action.ASK) {
await this.showDuplicates(duplicateMaps);
} else {
this.processDuplicates(duplicateMaps);
}
}
- Finally the actual merge happens in
processDuplicates
:
Made changes.
async processDuplicates(duplicateMaps) {
// ...existing code...
for (const { masterItem, otherItems } of items) {
selectedItemIDs.push(masterItem.id);
await merge(masterItem, otherItems); // Actual merging happens here
}
// ...existing code...
}
So the workflow is:
- Item gets added to Zotero
- Addition triggers notification
- Plugin detects duplicates
- User chooses action (or default action applies)
- Merging happens afterward if needed
This means the new item fully exists in the database before any duplicate handling occurs. This is important because it means:
- The addition always succeeds first
- No data is lost if the merge fails
- The merge can be undone by restoring from trash
Note
mimansajaiswal: so i want to add 2 more options -- (a) add and then use master prefs, (b) add, ignore item types and use master pref. can you implement these two options in action hooks? the ignore item types will need to modify sql query.
GitHub Copilot: I'll help you implement these two new options. Here's the step-by-step solution:
- First, add the new actions to the prefs enum:
prefs.ts
export enum Action {
// ...existing code...
MASTER = "master", // Add and use master prefs
MASTER_IGNORE_TYPE = "master_ignore_type", // Add, ignore types and use master prefs
}
- Add the new options to preferences UI:
preferences.xhtml
Made changes.
<radiogroup id="zotero-prefpane-__addonRef__-default-action"
preference="__prefsPrefix__.duplicate.default.action" orient="vertical">
<!-- ...existing code... -->
<radio data-l10n-id="pref-default-action-master" value="master" />
<radio data-l10n-id="pref-default-action-master-ignore-type" value="master_ignore_type" />
<!-- ...existing code... -->
</radiogroup>
- Modify the duplicate finding code to support ignoring item types:
duplicateFinder.ts
Made changes.
export class DuplicateFinder {
private readonly item: Zotero.Item;
private readonly itemTypeID: number;
private candidateItemIDs: number[];
private ignoreItemType: boolean;
constructor(item: Zotero.Item | number, ignoreItemType = false) {
// ...existing code...
this.ignoreItemType = ignoreItemType;
}
private buildTypeClause() {
return this.ignoreItemType ? "" : "AND itemTypeID = ?";
}
private async findByDOI() {
// ...existing code...
const typeClause = this.buildTypeClause();
const query = `SELECT DISTINCT itemID
FROM itemDataValues
JOIN itemData USING (valueID)
JOIN items USING (itemID)
LEFT JOIN deletedItems USING (itemID)
WHERE deletedItems.itemID IS NULL
AND libraryID = ?
${typeClause}
AND fieldID IN (${fieldIDInClause})
AND (${partialWhereClause}) ${candidateAndClause};`;
const params = [
this.item.libraryID,
...(this.ignoreItemType ? [] : [this.itemTypeID]),
...fieldIDs,
...doiParams,
...this.candidateItemIDs
];
// ...existing code...
}
// Apply similar changes to other find methods
// ...existing code...
}
- Modify the processing code to handle the new actions:
duplicates.ts
Made changes.
export class Duplicates {
async whenItemsAdded(duplicatesObj, ids) {
const defaultAction = getPref("duplicate.default.action") as Action;
// ...existing code...
const masterItemPref = getPref("bulk.master.item") as MasterItem;
const ignoreItemType = defaultAction === Action.MASTER_IGNORE_TYPE;
// For both new MASTER actions, use DuplicateFinder with appropriate type checking
if (defaultAction === Action.MASTER || defaultAction === Action.MASTER_IGNORE_TYPE) {
const duplicateItemMap = new Map<number, DuplicateItems>();
for (const id of ids) {
const finder = new DuplicateFinder(id, ignoreItemType);
const candidateIds = await finder.find();
if (candidateIds.length > 1) {
const duplicateItems = new DuplicateItems([id, ...candidateIds], masterItemPref);
duplicateItemMap.set(duplicateItems.key, duplicateItems);
duplicateMaps.set(id, {
existingItemIDs: candidateIds,
action: defaultAction
});
}
}
} else {
// Original duplicate finding logic
// ...existing code...
}
// ...existing code...
}
async processDuplicates(duplicateMaps) {
// ...existing code...
for (const [newItemID, { existingItemIDs, action }] of duplicateMaps) {
const newItem = Zotero.Items.get(newItemID);
if (action === Action.KEEP) {
// ...existing code...
} else if (action === Action.DISCARD) {
// ...existing code...
} else if (action === Action.MASTER || action === Action.MASTER_IGNORE_TYPE) {
// Use masterItemPref to determine master from all items including new one
const allItems = [newItem, ...existingItemIDs.map(id => Zotero.Items.get(id))];
const duplicateItems = new DuplicateItems(allItems, masterItemPref);
const masterItem = duplicateItems.masterItem;
const otherItems = duplicateItems.otherItems;
items.push({
masterItem,
otherItems
});
}
}
// ...existing code...
}
}
This implementation:
- Adds two new action types
- Modifies DuplicateFinder to optionally ignore item types in SQL queries
- Adds new processing logic that uses masterItemPref for all items (including new ones)
- Maintains backward compatibility with existing actions
You'll also need to add appropriate strings to your localization files for the new UI options.
Would you like me to provide any additional details or explain any part of the implementation further?
Made changes.
Note
mimansajaiswal: you also need to change all ftl files i think?
GitHub Copilot: Yes, you're right! We need to add the new strings to the FTL (Fluent) localization files. Here's what needs to be added:
// ...existing code...
pref-default-action-master =
.label = Use Master Preferences
.accesskey = M
pref-default-action-master-ignore-type =
.label = Use Master Preferences (Ignore Item Types)
.accesskey = T
And for Chinese:
Made changes.
// ...existing code...
pref-default-action-master =
.label = 使用主条目首选项
.accesskey = M
pref-default-action-master-ignore-type =
.label = 使用主条目首选项(忽略条目类型)
.accesskey = T
These strings will correspond to the new radio options we added in the preferences.xhtml file for both actions:
- Master preferences
- Master preferences with type ignoring
Would you like me to show any other localization files that need to be modified?
Made changes.
Note
mimansajaiswal: can the ignore item types be a checkbox for actions that is always considered for any action based off whether it is true or false? can you modify all files to make it this.
GitHub Copilot: Yes, we can make "ignore item types" a global checkbox that applies to any action. Here's how to modify the relevant files:
prefs.ts
Add new preference type:
// ...existing code...
export function ignoreItemTypes() {
return getPref("duplicate.ignore.types") as boolean;
}
export enum Action {
KEEP = "keep",
DISCARD = "discard",
CANCEL = "cancel",
ASK = "ask",
MASTER = "master", // Just use master prefs
// Remove MASTER_IGNORE_TYPE as it's now a global setting
}
// ...existing code...
prefs.js
Add new preference:
Made changes.
/* eslint-disable no-undef */
pref("__prefsPrefix__.duplicate.default.action", "ask");
pref("__prefsPrefix__.bulk.master.item", "oldest");
pref("__prefsPrefix__.duplicate.stats.enable", true);
pref("__prefsPrefix__.duplicate.type.mismatch", "skip");
pref("__prefsPrefix__.duplicate.ignore.types", false); // Add this line
preferences.xhtml
Add checkbox and reorganize preferences:
Made changes.
<vbox class="pref-section">
<groupbox>
<label>
<html:h2 data-l10n-id="pref-action-title"></html:h2>
</label>
<label data-l10n-id="pref-default-action-description" />
<radiogroup id="zotero-prefpane-__addonRef__-default-action"
preference="__prefsPrefix__.duplicate.default.action" orient="vertical">
<radio data-l10n-id="pref-default-action-keep-this" value="keep" />
<radio data-l10n-id="pref-default-action-keep-others" value="discard" />
<radio data-l10n-id="pref-default-action-keep-all" value="cancel" />
<radio data-l10n-id="pref-default-action-always-ask" value="ask" />
<radio data-l10n-id="pref-default-action-master" value="master" />
</radiogroup>
<checkbox id="zotero-prefpane-__addonRef__-ignore-types"
data-l10n-id="pref-ignore-item-types"
preference="__prefsPrefix__.duplicate.ignore.types" />
</groupbox>
</vbox>
<!-- ...existing code... -->
Made changes.
// ...existing code...
pref-ignore-item-types =
.label = Ignore item types when detecting duplicates
.accesskey = I
// ...existing code...
Made changes.
async whenItemsAdded(duplicatesObj, ids) {
const defaultAction = getPref("duplicate.default.action") as Action;
const ignoreTypes = ignoreItemTypes();
// ...existing code...
const duplicateMaps = ids.reduce((acc, id) => {
// Create new DuplicateFinder with ignoreTypes flag from preferences
const finder = new DuplicateFinder(id, ignoreTypes);
const candidateIds = await finder.find();
if (candidateIds.length > 1) {
acc.set(id, {
existingItemIDs: candidateIds.filter(i => i !== id),
action: defaultAction
});
}
return acc;
}, new Map());
// ...existing code...
}
This implementation:
- Makes "ignore item types" a global checkbox preference
- Applies to all actions (keep, discard, master, etc.)
- Simplifies the action types by removing the redundant MASTER_IGNORE_TYPE
- Uses the preference when creating DuplicateFinder instances
The changes make the feature more intuitive and maintainable, since the item type ignoring is now a separate concern from the action choice.
Would you like me to show any other changes needed to implement this?
Made changes.