Skip to content

[PDF Generator] Adicionar botão Crossmark aos PDFs dos artigos#1115

Draft
Copilot wants to merge 2 commits intomasterfrom
copilot/add-crossmark-button-to-pdfs
Draft

[PDF Generator] Adicionar botão Crossmark aos PDFs dos artigos#1115
Copilot wants to merge 2 commits intomasterfrom
copilot/add-crossmark-button-to-pdfs

Conversation

Copy link
Contributor

Copilot AI commented Mar 5, 2026

Adiciona suporte ao botão Crossmark nos PDFs gerados pelo packtools: logo clicável na primeira página com hyperlink para o diálogo Crossref, e metadados XMP completos embutidos no PDF.

O que esse PR faz?

  • Novo módulo packtools/sps/formats/pdf/crossmark.py com função principal:
    add_crossmark(
        input_pdf="artigo.pdf",
        output_pdf="artigo_cm.pdf",
        doi="10.1590/s0100-12345",
        date_stamp="2026-01-15",
        logo_path=None,        # usa logo bundled por padrão
        position="top-right",  # top-right | top-left | bottom-right | bottom-left
        width=150,             # pontos tipográficos
    )
  • Overlay da logo na primeira página via reportlab; demais páginas intactas
  • Anotação URI clicável (https://crossmark.crossref.org/dialog?doi=...&domain=pdf&date_stamp=...) via pypdf
  • Metadados XMP com campos nos namespaces dc, prism, crossmark e pdfx; preserva XMP existente via merge
  • CLI crossmark_pdf (novo entry point) com modo single-file e modo batch via CSV:
    # single
    crossmark_pdf --input artigo.pdf --doi 10.1590/s0100-12345 --date-stamp 2026-01-15
    
    # batch
    crossmark_pdf --csv batch.csv   # colunas: doi, input_pdf, date_stamp[, output_pdf]
  • Logo Crossmark bundled em packtools/sps/formats/pdf/assets/CROSSMARK_Color_horizontal.png
  • Novas dependências: pypdf>=3.0.0, reportlab>=3.6.0

Onde a revisão poderia começar?

packtools/sps/formats/pdf/crossmark.py — módulo completo com toda a lógica de inserção, XMP e CLI.

Como este poderia ser testado manualmente?

pip install pypdf reportlab

# single-file
crossmark_pdf \
  --input artigo.pdf \
  --output artigo_cm.pdf \
  --doi 10.1590/s0100-12345 \
  --date-stamp 2026-01-15

# Verificar no Adobe Reader / Evince: logo visível no canto superior direito,
# clicável, abrindo https://crossmark.crossref.org/dialog?doi=...
# Inspecionar XMP: exiftool artigo_cm.pdf | grep -i crossmark

Algum cenário de contexto que queira dar?

O Crossmark é requisito de indexadores (Crossref, PubMed) para indicar que um PDF é a versão canônica/atualizada. A abordagem de overlay garante que o conteúdo original do PDF não é alterado — apenas a primeira página recebe a logo e a anotação; a stream de conteúdo das páginas seguintes é preservada integralmente. O merge de XMP preserva metadados preexistentes.

Screenshots

N/A — saída é um arquivo PDF. Validação via exiftool ou inspeção de anotações com pypdf.

Quais são tickets relevantes?

Relacionado à issue sobre adicionar botão Crossmark aos PDFs dos artigos.

Referências

Original prompt

This section details on the original issue you should resolve

<issue_title>[PDF Generator] Adicionar botão Crossmark aos PDFs dos artigos</issue_title>
<issue_description>## Contexto

O Crossmark é um serviço da Crossref que permite aos leitores verificar se o conteúdo que estão lendo é a versão mais atual ou se houve correções, retratações ou atualizações. Para isso, um botão/logo do Crossmark deve ser inserido nos PDFs com um link específico por artigo.

Referência: https://www.crossref.org/documentation/crossmark/

Objetivo

Criar uma função que insira o logo do Crossmark em PDFs de artigos, com link para a página Crossmark do respectivo DOI.

Requisitos

Inserção do logo

  • Inserir o logo do Crossmark (formato SVG ou PNG) em posição configurável no PDF (padrão: canto superior direito da primeira página).
  • O logo deve conter um hyperlink no formato:
    https://crossmark.crossref.org/dialog?doi={DOI}&domain=pdf&date_stamp={DATA}
    
    • {DOI}: DOI do artigo (ex: 10.1590/s0100-12345)
    • {DATA}: data da última versão significativa do PDF, formato YYYY-MM-DD

Metadados XMP (opcional, mas desejável)

  • Adicionar/atualizar metadados XMP no PDF com os campos:
    • dc:identifierdoi:{DOI}
    • prism:doi{DOI}
    • prism:urlhttps://doi.org/{DOI}
    • crossmark:MajorVersionDate{DATA}
    • crossmark:DOI{DOI}
    • pdfx:doi{DOI}
    • pdfx:CrossmarkMajorVersionDate{DATA}

Os campos devem existir tanto no namespace crossmark quanto no pdfx (requisito para indexadores que leem o PDF dictionary).

Interface

  • Sugestão de função principal com assinatura:
    def add_crossmark(
        input_pdf: str,
        output_pdf: str,
        doi: str,
        date_stamp: str,
        logo_path: str = "CROSSMARK_Color_horizontal.png",
        position: str = "top-right",
        width: int = 150,
    ) -> None:
  • Suportar uso via CLI:
    python add_crossmark.py --input artigo.pdf --output artigo_cm.pdf --doi 10.1590/s0100-12345 --date-stamp 2026-01-15
  • Suportar processamento em lote a partir de um CSV (doi,input_pdf,date_stamp).

Restrições

  • Não alterar o conteúdo existente do PDF (texto, imagens, paginação).
  • O logo deve ser clicável (hyperlink funcional em leitores como Adobe Reader, Evince, Preview).
  • Usar bibliotecas open source (sugestões: pikepdf, pypdf, reportlab).

Critérios de aceite

  • Logo visível na primeira página do PDF de saída
  • Link do logo abre a URL correta do Crossmark com DOI e date_stamp
  • Metadados XMP presentes e válidos no PDF de saída
  • Modo CLI funcionando com --input, --output, --doi, --date-stamp
  • Modo lote funcionando com --csv
  • Testes com pelo menos 3 PDFs de tamanhos/layouts diferentes
  • Nenhuma alteração no conteúdo original do PDF
  • Ser usado na geração do PDF
  • Ser usado em qualquer PDF já existente</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Crossmark button to articles' PDFs [PDF Generator] Adicionar botão Crossmark aos PDFs dos artigos Mar 5, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Crossmark button support to PDFs generated by packtools. Crossmark is a Crossref service that lets readers verify whether a document is the latest version; the implementation overlays a clickable logo on the first page with a URI annotation and embeds XMP metadata with DOI and versioning fields.

Changes:

  • New module packtools/sps/formats/pdf/crossmark.py with add_crossmark() function, XMP helpers, logo overlay logic, and a main() CLI entry point
  • New crossmark_pdf console-script entry point and pypdf/reportlab dependencies added to setup.py
  • Bundled Crossmark logo PNG asset and comprehensive test suite covering unit and integration cases

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.

File Description
packtools/sps/formats/pdf/crossmark.py Core module: logo overlay via reportlab, URI annotation via pypdf, XMP metadata merge, and CLI
packtools/sps/formats/pdf/assets/CROSSMARK_Color_horizontal.png Bundled Crossmark logo for use as default logo
setup.py Adds pypdf>=3.0.0, reportlab>=3.6.0 to install_requires and crossmark_pdf console script
tests/sps/formats/pdf/test_crossmark.py Unit and integration tests for all public and helper functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import unittest
from unittest.mock import patch, MagicMock

from pypdf import PdfReader, PdfWriter
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PdfWriter is imported but never directly used in the test file (it is only used indirectly via the add_crossmark function under test). This unused import should be removed.

Suggested change
from pypdf import PdfReader, PdfWriter
from pypdf import PdfReader

Copilot uses AI. Check for mistakes.
Comment on lines +170 to +175
for prefix, uri in _XMP_NAMESPACES.items():
ElementTree.register_namespace(prefix, uri)
ElementTree.register_namespace("x", "adobe:ns:meta/")
ElementTree.register_namespace(
"rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ElementTree.register_namespace() modifies a global dictionary in the Python standard library. In a multi-threaded or multi-process environment (e.g., when processing a large batch CSV concurrently), concurrent calls to _merge_xmp_packet could cause race conditions on this shared state, potentially leading to namespace prefix mangling in serialized XML output. Consider either using lxml (which has per-tree namespace handling) or protecting this block with a module-level lock if concurrent use is expected.

Copilot uses AI. Check for mistakes.
bytes: UTF-8 encoded XMP packet.
"""
fields_xml = "\n".join(
f" <{field}>{value.format(doi=doi, date_stamp=date_stamp)}</{field}>"
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DOI is interpolated directly into an XML string without escaping XML special characters. If a DOI contains &, <, or > (e.g., a DOI with an ampersand, which is technically valid), this will produce malformed XML that could fail to parse or embed corrupt XMP into the PDF.

The DOI and date_stamp values should be XML-escaped (e.g., using xml.sax.saxutils.escape()) before being inserted into the manually-built XML string.

Copilot uses AI. Check for mistakes.
first_page.merge_page(overlay_page)

# --- Add URI annotation (clickable hyperlink) on first page ---
crossmark_url = _CROSSMARK_URL.format(doi=doi, date_stamp=date_stamp)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DOI is interpolated directly into the URL query string without percent-encoding. DOIs can contain characters such as #, +, or spaces, which are significant in a URL query string and would produce an incorrect or broken Crossmark link (e.g., a # would be interpreted as the start of a URL fragment, truncating the query string entirely).

The DOI should be URL-encoded before being placed in the query string, e.g., using urllib.parse.quote(doi, safe="") or urllib.parse.urlencode({"doi": doi, "domain": "pdf", "date_stamp": date_stamp}).

Copilot uses AI. Check for mistakes.
generic.NameObject("/Type"): generic.NameObject("/Metadata"),
generic.NameObject("/Subtype"): generic.NameObject("/XML"),
})
xmp_ref = writer._add_object(xmp_stream)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writer._add_object() is a private/internal pypdf API (denoted by the leading underscore). Private APIs are not part of the public contract and may be renamed, removed, or have their behavior changed in any minor or patch version of pypdf without notice. The public equivalent writer.add_object() was introduced in pypdf 3.x and should be used instead to ensure forward compatibility.

Suggested change
xmp_ref = writer._add_object(xmp_stream)
xmp_ref = writer.add_object(xmp_stream)

Copilot uses AI. Check for mistakes.
import csv
import tempfile
import unittest
from unittest.mock import patch, MagicMock
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MagicMock is imported but never used in the test file. This is a dead import that should be removed to keep the test file clean.

Suggested change
from unittest.mock import patch, MagicMock
from unittest.mock import patch

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PDF Generator] Adicionar botão Crossmark aos PDFs dos artigos

3 participants